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Abstract 


The  central  component  of  standard  approaches  to  compile-time  program  analysis 
is  an  abstract  domain  for  approximating  program  vadues.  Importatntly,  the  domain 
must  be  chosen  so  that  an  iterative  fixed  point  computation  over  the  domain  ter¬ 
minates.  This  requirement  represents  a  substantial  restriction  on  the  accurau;y  of 
the  anadysis.  Furthermore,  it  leatds  to  complex  amd  often  chatotic  behavior. 

We  present  an  alternative  approaich  to  prograun  auaadysis,  cadled  set  baised  analysis. 
A  key  feature  of  set  based  analysis  is  that  reasoning  about  a  program’s  run-time 
behavior  is  reduced  to  reasoning  about  constraunts  on  sets  of  program  vatlues.  Set 
based  analysis  incorporates  just  a  single  notion  of  approximation:  adl  dependencies 
arising  from  the  treatment  of  program  vairiables  are  ignored.  The  main  advamtage  of 
set  based  analysis  is  improved  accurau:y,  due  to  the  absence  of  an  abstract  domaun. 
Additionally,  the  use  of  a  very  simple  and  uniform  notion  of  approximation  leads  to 
program  analysis  that  is  easier  to  understand  and  less  sensitive  to  minor  program 
modifications. 

The  core  part  of  this  thesis  presents  an  algorithm  for  set  based  anadysis.  Impor¬ 
tantly,  the  standard  iterative  fixed  point  algorithms  used  in  the  program  analysis 
literature  can  not  be  used  for  set  based  analysis  (they  do  not  terminate).  We  there¬ 
fore  employ  a  fundamentally  different  technique,  based  on  the  use  of  constraints 
on  sets  of  values.  Using  these  constraunts,  we  develop  algorithms  for  the  amady- 
sLs  of  logic,  imperative  and  functionad  languages  (the  underlying  program  values 
in  each  case  are  data  structures).  A  prototype  implementation  is  described.  Al¬ 
though  a  straightforward  implementation  of  the  set  constraint  algorithm  leads  to 
very  poor  performance,  very  substantial  improvements  have  been  obtauned  using 
appropriate  representation  schemes  and  minimization  techniques.  This  prototype 
provides  strong  evidence  that  prau;tical  amalysis  based  on  set  based  techniques  is 
within  reach. 

An  underlying  philosophy  of  set  based  amalysis  is  the  separation  of  the  definition  of 
program  approximations  from  algorithmic  considerations.  This  is  reflected  in  the 
use  of  constraints  to  dcfint  program  approximation,  and  set  constraunt  algorithms 
to  compute  it.  The  constraunts  used  form  a  flexible  and  declairative  intermediate 
language  for  defining  and  reasoning  about  program  approximations. 
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Chapter  1 

Thesis  Summary 


We  describe  an  approach  to  compile- time  program  analysis  which  essentially 
treats  program  variables  as  sets  of  values.  Specifically,  for  each  program  vari¬ 
able  X  and  program  point  fi,  we  introduce  a  set  variable  which  is  intended  to 
capture  the  set  of  values  for  X  at  point  /i.  Then,  using  these  set  variables, 
set  constraints  are  written  to  capture  relationships  inherent  in  each  pro¬ 
gram  statement.  For  example,  consider  an  imperative  program  statement^ 
X  :=  con5(y,A').  If  /x  is  the  program  point  just  before  this  statement  and 
1/  is  the  program  point  just  after  this  statement,  then  we  introduce  set  vari¬ 
ables  to  capture  the  values  of  X  and  Y  at  points  fi  and  v 

respectively,  and  we  write  the  constraints 

A"'  D  consiy>^,  AT") 

y''  2  y*"- 

The  first  constraint  specifies  that  AT"  must  contain  all  values  of  the  form 
cons(vi,V2)  such  that  vi  is  contadned  in  y^  and  V2  is  contained  in  A''*. 
The  second  specifies  that  y'  must  contain  all  values  from  y**.  Note  that 
these  constraints  approximate  the  variable  relationships  contained  in  X  := 
cons(y,X)  by  ignoring  dependencies  between  the  values  of  X  and  Y. 

Such  a  construction  of  constraints  may  be  used  to  reduce  the  problem  of 
obtaining  information  about  the  run-time  values  of  each  program  variable 
into  the  problem  of  reasoning  about  a  collection  constraints  between  sets 

*  eont  denotes  the  list  construction  function. 
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of  program  values.  If  this  reduction  proems  is  to  be  used  as  the  basis  of 
a  program  analysis,  two  issues  must  be  addressed.  First,  what  is  the  re¬ 
lationship  between  the  program  and  the  corresponding  set  constraints  (in 
particular,  in  what  sense  are  the  constraints  "correct”),  and  second,  how  can 
we  “solve”  the  constraints  to  obtain  useful  information  about  a  program’s 
run-time  behavior.  In  essence,  these  are  the  two  main  components  of  the 
thesis. 

To  address  the  first  question,  we  show  that  the  use  of  set  constraints 
corresponds  to  a  very  simple  and  intuitive  notion  of  program  approximation 
which  can  be  characterized  as  follows:  the  one  and  only  approximation  made 
is  that  all  inter- variable  dependencies  are  ignored.  This  correspondence  is 
established  for  a  variety  of  languages  and  operational  semantics.  In  each 
case,  the  basic  plan  is  the  same.  Starting  with  an  operational  semantics, 
a  collecting  semantics  defined  by  specifying  a  notion  of  “program  point” 
and  then  projecting  the  operational  semantics  onto  these  program  points 
(this  simply  involves  collecting  the  environments  for  each  program  point). 
This  definition  of  collecting  semantics  is,  by  itself,  of  little  value  for  defining 
program  approximation.  We  therefore  develop  a  constraint  formulation  of 
the  collecting  semantics:  given  a  program,  we  show  how  environment  con¬ 
straints  may  be  constructed  such  that  the  least  model  of  the  environment 
constraints  corresponds  to  the  program’s  collecting  semantics. 

The  advantage  of  the  environment  constraints  is  that  they  can  be  re¬ 
interpreted  in  a  number  of  different  ways.  Such  an  alternative  interpre¬ 
tation  is  used  to  define  set  based  program  approximation.  In  essence,  we 
show  how  the  constraints  may  be  interpreted  so  that  inter-variable  depen¬ 
dencies  may  be  ignored  by  treating  program  variables  as  sets.  Then,  the  set 
based  approximation  of  a  program  is  defined  to  be  the  smallest  such  “set” 
interpretation  which  is  a  model  of  the  constraints.  That  is,  the  least  (stan¬ 
dard)  model  of  the  environment  constraints  gives  the  program’s  collecting 
semantics,  and  the  least  set  model  ^ves  the  set  based  approximation  of  the 
program.  Since  a  set  based  model  is  a  (standard)  model,  it  follows  that  the 
set  based  approximation  of  a  program  is  correct  in  the  sense  that  it  contains 
the  program’s  collecting  semantics. 

To  complete  the  correspondence  of  a  program  and  its  set  constraints,  we 
show  that,  when  interpreted  using  set  based  models,  the  environment  con¬ 
straints  for  a  program  are  equivalent  to  the  set  constraints  for  a  program. 
This  proves  two  things.  First  it  shows  that  any  model  of  the  set  constraints 
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provides  a  “safe”  approximation  of  a  program’s  run-time  behavior.  Second 
it  shows  that  the  least  model  of  the  set  constraints  is  the  set  based  approxi¬ 
mation  of  a  program,  and  so  computing  the  least  model  of  the  set  constraints 
corresponds  to  an  analysis  of  the  program  in  which  the  only  approximation 
made  is  that  all  inter-variable  dependencies  are  ignored. 

The  central  technical  part  of  the  thesis  addresses  the  issue  of  “solving” 
set  constraints.  What  we  wish  to  compute  is  a  representation  of  the  least 
model  of  the  set  constraints  of  a  program  which  is  explicit  in  the  sense  that 
properties  of  the  model  are  immediately  evident.  For  example,  it  should  be 
straightforward  to  inspect  the  representation  to  find  out  if  the  set  assigned 
to  some  set  variable  consists  only  of  non-empty  lists.  To  achieve  this  goal, 
we  develop  an  algorithm  that  reads  in  a  collection  of  set  constraints  obtained 
from  a  program,  and  essentially  produces  a  description  of  the  least  model 
in  the  form  of  a  regular  term  grammar  for  each  set  variable  appearing  in 
the  constraints.  This  has  two  important  corollaries.  First  it  proves  that 
set  based  approximations  are  recursive  (in  the  sense  that  the  problem  of 
determining  elementhood  in  the  set  based  approximation  of  a  program  is 
decidable).  Second,  it  shows  that  set  based  program  approximations  are,  in 
fact,  regular  languages.  This  has  important  implementation  consequences. 

The  main  contributions  of  the  thesis  can  be  loosely  classified  as:  the 
definition  of  set  based  program  approximation;  an  algorithm  for  computing 
set  based  program  approximations;  the  development  of  effective  implemen¬ 
tation  strategies  for  set  based  analysis,  and  an  outline  of  possible  extensions 
to  set  based  analysis.  We  now  expand  on  each  of  these  in  turn. 


Definition  of  Set  Based  Approximation 

One  of  the  advantages  of  set  based  analysis  is  that  it  employs  a  simple  and 
intuitive  definition  of  program  approximation.  This  is  motivated  by  a  desire 
to  separate  the  definition  of  program  approximations  from  the  algorithms 
used  to  compute  it,  and  leads  to  declarative  program  analysis  which  is  eas¬ 
ier  to  understand  and  reason  about.  In  contrast,  most  approaches  in  the 
program  analysis  literature  provide  only  an  implicit  algorithmic  definition 
of  program  approximation,  and  this  typically  results  in  analysis  that  is  dif¬ 
ficult  to  predict. 

Perhaps  the  maun  advantage  of  set  batsed  analysis  is  that  the  definition 
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of  approximation  is  very  uniform.  In  particular,  there  is  no  approximation 
of  the  underlying  domain  of  values.  This  leads  to  important  advantages  in 
terms  of  accuracy  (particularly  for  reasoning  about  datarstructures).  More¬ 
over,  we  believe  that  this  uniformity  has  implications  for  the  stability  and 
scalability  of  the  analysis.  In  contrast,  abstract  interpretation  approaches 
to  prograun  analysis  employ  an  abstract  domain  and  this  domain  must  be 
chosen  so  that  the  iterative  fixed  point  computation  terminates.  This  re¬ 
quirement  represents  a  substantial  restriction  on  the  accuracy  of  the  analysis, 
and  often  leads  to  complex  and  chaotic  behavior. 


The  Set  Based  Analysis  Algorithm 

The  algorithm  for  computing  set  based  approximations  is  based  on  the  use 
of  set  constraints.  While  constraints  have  been  used  before  to  reason  about 
programs,  our  algorithm  advances  previous  work  in  a  number  of  ways.  First, 
the  constraints  we  use  yield  substamtially  more  accurate  program  approxi¬ 
mations  than  the  constraints  in  previous  works.  In  particular,  a  number  of 
previous  algorithms  have  excluded  intersection.  While  the  inclusion  of  inter¬ 
section  significantly  complicates  the  set  constraint  algorithms,  it  also  leads 
to  much  more  expressive  constraints.  The  previous  works  that  have  used 
intersection  employ  an  approximate  form  of  union  and  have  not  provided 
complete  algorithms. 

Second,  we  develop  a  general  framework  for  using  constraints  to  ana¬ 
lyze  programs  over  a  variety  of  languages  and  operational  semantics.  This 
is  carried  out  using  a  single  constraint  formalism.  In  contrast,  previous 
works  have  been  tied  to  a  particular  language  and  operational  semantics. 
Moreover,  we  extend  the  application  of  constraints  to  the  analysis  of  logic 
programs  under  top-down  operational  models. 

Third,  we  use  constraints  to  compute  a  specific  program  approximation 
which  is  defined  independently  of  set  constraints.  In  contrast,  most  previ¬ 
ous  works  that  involve  constraints  (or,  equivalently,  various  term  grammar 
formalisms)  have  employed  constraints  for  the  purpose  of  computing  some 
program  approximation.  Hence,  proving  the  correctness  of  such  algorithms 
requires  showing  that  the  algorithm  computes  a  safe  approximation  of  pro¬ 
gram’s  run-time  behavior.  However,  our  algorithm  is  designed  to  compute 
exactly  the  set  based  approximation  of  a  prograun,  and  so  establishing  the 
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algorithm’s  correctness  involves  ^tablishing  equivalences,  not  just  a  con¬ 
tainments. 

Implementation  of  Set  Based  Analysis 

A  straightforward  implementation  of  the  set  constraint  algorithm  leads  to 
poor  performance.  However,  we  show  that  very  substantial  improvements 
can  be  made  using  appropriate  representation  schemes  and  minimization 
techniques.  Although  much  work  remains,  we  can  now  analyze  programs  of 
the  order  of  a  hundred  lines  in  about  5  seconds^.  This  prototype  provides 
strong  evidence  that  practical  analysis  based  on  set  based  techniques  is 
within  reach. 


Extensions  to  Set  Based  Analysis 

The  basic  set  based  analysis  definitions  and  algorithm  deal  with  computing 
an  approximation  of  the  run-time  values  of  program  variables.  However, 
the  ideas  of  set  based  analysis  are  not  restricted  to  this  kind  of  analysis. 
We  shall  sketch  extensions  to  set  based  analysis  for  modes  and  structure 
sharing,  functional  programs,  and  finally,  an  extension  for  capturing  some 
information  about  inter-variable  dependencies. 


^Uting  SML  of  New  Jeney  on  a  Sun  Sparc  1+. 


6 


CHAPTER  1.  THESIS  SUMMARY 


Chapter  2 

Introduction 


2.1  Program  Analysis 


Compile-time  program  analysis  is  about  analyzing  a  program  to  determine 
properties  of  its  run-time  behavior.  Such  analysis  is  a  central  component 
of  an  optimizing  compiler  because  information  about  run-time  behavior  is 
prerequisite  for  many  code  optimization  techniques.  One  important  kind  of 
analysis  involves  finding  the  possible  run-time  values  of  variables.  That  is, 
it  seeks  to  establish  invariants  such  as  "when  this  statement  is  executed, 
the  value  of  the  variable  X  is  always  positive”.  Such  information  can  be  ex¬ 
ploited  during  compilation  to  identify  redundant  tests  or  unreachable  state¬ 
ments  in  a  program,  or  to  find  efficient  representations  of  data.  For  example, 
when  compiling  a  statement  that  involves  taking  the  square  root  of  X,  if  it 
is  known  that  X  will  always  be  positive,  then  at  run-time  there  is  no  need 
to  check  that  the  square  root  operation  is  well  defined.  As  another  example, 
consider  compiling  a  statement  that  involves  extracting  the  first  element  of 
the  list  L.  If  it  is  known  that  L  is  always  a  non-empty  list,  then  at  run-time 
there  is  no  need  to  check  that  the  extraction  is  well  defined. 

Finding  the  possible  run-time  values  of  variables  is  only  one  kind  of  useful 
information  that  can  be  obtained  from  a  program.  Others  include  available 
subexpressions,  live  variables,  aliasing,  sharing,  strictness  (for  functional 
programs)  and  modes  (for  logic  programs).  In  this  thesis  we  concentrate 
on  the  run-time  values  of  variables,  with  particular  emphasis  on  accurate 
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X  :=-  5; 


Program  Point 

Values  for  X 

while  X  <  0  do 

{-5} 

{-5,...,0} 

X:=X  +  1; 

© 

{-4,...,!} 

© 

© 

{1} 

© 


Figure  2.1:  Program  1  and  Its  Collecting  Semantics 

treatment  of  data  structures.  This  analysis  is  fundamental  in  the  sense  that 
it  is  applicable  to  almost  all  classes  of  languages,  and  its  results  can  be  used 
to  directly  improve  other  kinds  of  analysis. 

To  see  how  information  about  the  run-time  values  of  variables  may  be 
computed,  consider  the  program  in  Figure  2.1.  In  this  program,  0,  © 

and  (g)  are  used  as  textual  markers  to  indicate  points  in  the  program.  Point 
®  indicates  the  point  just  after  execution  of  the  statement  X  :=  ~5,  points 
(g)  and  ©  indicate  the  points  immediately  before  and  after  the  statement 
X  :=  X  +  1,  and  point  <g)  indicates  the  point  after  execution  of  the  entire 
while-do  statement.  The  accompanying  table  in  figure  2.1  shows  the  result 
of  tracing  the  execution  of  the  program  and  collecting  the  values  of  the 
variable  X  at  each  program  point.  This  is  often  called  a  collecting  semantics 
since  it  collects  together  information  about  each  point  in  the  program.  Such 
an  analysis  is  global  in  the  sense  that  the  program  properties  discovered 
depend  on  the  program  as  a  whole;  information  about  the  execution  of 
a  specific  statement  cannot  be  obtained  by  considering  the  statement  in 
isolation. 

Now,  the  information  just  computed  in  Figure  2.1  is  exact  in  the  sense 
that  it  corresponds  exactly  to  what  happens  when  the  program  runs.  In 
general  this  is  not  possible  because  it  entails  solving  the  halting  problem. 
Moreover,  even  if  a  particular  program  can  be  analyzed  exactly,  it  is  usu¬ 
ally  not  feasible  to  do  so.  Approximation  therefore  plays  a  central  role  in 
program  analysis.  What  is  desired  is  an  approximation  that  is  correct  in 
the  sense  that  it  contains  all  possible  values  encountered  at  run-time.  In 
other  words,  if  a  variable  X  takes  a  value  v  at  some  program  point  during 
program  execution,  then  we  require  that  the  set  of  values  for  X  specified  by 
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the  approximation  at  this  point  must  contain  the  value  v. 


2.2  Program  Analysis  by  Abstract  Interpretation 


Most  of  the  approaches  for  computing  such  approximations  employ  abstract 
interpretation  or  "pseudo-evaluation”,  which  is  a  technique  that  dates  back 
to  the  early  1960’s  and  was  used  in  some  of  the  first  compilers  [29,  52]. 
The  basic  idea  is  to  replace  the  underlying  collection  of  computation  values 
by  a  collection  of  approximate  values,  and  then  replace  computation  by 
approximate  computation  over  the  approximate  values.  In  the  case  of  the 
previous  program  example,  the  underlying  values  of  the  computation  are  the 
integers.  Consider  replacing  this  with  the  approximate  values  pos,  neg  and 
int,  respectively  denoting  the  set  of  positive  integers  (including  zero),  the  set 
of  negative  integers  (also  including  zero)  and  the  set  of  all  integers.  Analysis 
can  then  be  performed  by  a  "symbolic  execution”  of  the  program  using  these 
approximate  values.  The  effect  of  this  is  to  simulate  the  program’s  actual 
executions.  Specifically,  information  is  associated  with  each  program  point 
about  the  possible  values  for  X  that  may  be  encountered  at  that  point. 
This  information  is  then  repeatedly  propagated  from  one  program  point  to 
the  next  until  propagation  of  information  from  each  program  point  does  not 
yield  any  new  information,  and  a  "consistent”  state  is  reached.  We  illustrate 
this  using  the  program  from  Figure  2.1.  Initially  the  approximate  value  of 
X  at  each  program  point  is  the  empty  set  of  values. 

1.  The  value  ~5  is  approximated  by  neg,  and  so  the  value  of  X  at  point 
@  is  approximated  by  neg. 

2.  Propagate  from  @  to  ©:  since  any  value  in  neg  is  less  than  or  equal 
to  0,  the  value  of  X  at  (g)  is  approximated  by  neg.  Note  that  the 
propagation  of  the  information  at  (g)  to  ©  does  not  yield  amything 
since  none  of  the  values  approximated  by  neg  satisfy  X  ^  0. 

3.  Propagate  from  ©  to  ©:  since  adding  1  to  neg  gives  int,  the  value  of 
A  at  ©  is  approximated  by  int. 

4.  Propagate  from  ©  to  ©:  the  information  at  ©  is  updated  to  pos.  Note 
that  propagation  of  information  from  ©  to  ©  does  not  yield  any  new 
information. 
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L  :=  a.b.nil; 

X  :=  c; 

while  (L  /  nil)  do 

X  :=  car(i); 

L  :=  cdr(L); 

© 

© 

Figure  2.2:  Program  2  and  Its  Collecting  Semantics 

A  key  observation  is  that  if  a  consistent  state  is  reached,  and  this  is  the 
case  after  the  four  steps  shown  above,  then  the  approximation  obtained  is  a 
correct  approximation  of  the  program’s  actual  executions  in  the  sense  that 
it  contains  all  of  the  actual  run-time  values  for  X.  Now  there  are  many  cor¬ 
rect  approximations  of  a  program.  For  example,  the  approximation  defined 
by  associating  all  possible  values  with  each  program  point  is  conservative. 
However  such  an  extreme  approximation  does  not  say  anything  useful  about 
the  program.  It  is  usually  desirable  to  obtain  the  smallest  (or  most  accurate) 
approximation  that  is  correct. 

The  example  program  in  Figure  2.1  has  only  one  variable,  and  so  to 
capture  the  executions  of  the  program,  it  was  sufficient  to  associate  a  set  of 
values  for  this  variable  with  each  program  point.  In  the  general  case,  where 
there  is  more  than  one  program  variable,  the  only  essential  change  is  that  a 
set  of  environments  is  associated  with  each  program  point  instead  of  a  set 
of  values.  To  illustrate  this,  consider  the  program  in  figure  2.2.  This  main 
loop  of  this  program  computes  the  last  element  of  the  list  L.  The  operators 
car  and  cdr  are  the  usual  LISP  operators  for  decomposing  lists  (car(T)  gives 
the  first  element  of  L  and  cdr{L)  gives  the  “rest”  of  L).  The  symbols  a, 
b  and  c  denote  atomic  constants,  and  a.b.nil  denotes  the  list  consisting  of 
a  followed  by  b.  Again  <3),  ©,  ©  and  ©  indicate  points  in  the  program. 
The  accompanying  table  in  Figure  2.2  gives  the  environments  encountered 
at  these  points  during  program  execution. 

Consider  an  abstract  interpretation  over  this  program.  Suppose  that 
the  underlying  collection  of  values  for  this  program  consists  of  constants  a. 
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Point 

Actual  Environments 

Abstract  Interpretation 

{[Lt-^a.b.nil,  Xi-^c]} 

const]} 

J  [L>-^a.b.nil,  Xt-*’c]  1 
\  [L^b.nil,  X*-*a]  j 

{[L>-*  non-empty,  X^-^  any]} 

© 

(  [L*-^b.nil,  X*-^a\  1 
[L^nil,  X>-^b]  j 

{[Li-^list,  X*-^any]} 

{[L>-^empty,  Y  •-►6]} 

{[L*-*  empty,  Xi-^  any]} 

Figure  2.3:  Abstract  Interpretation  of  Program  2 


b  and  c,  and  lists  constructed  from  these  constants.  One  very  simple  col¬ 
lection  of  approximate  values  for  analyzing  this  program  consists  of  const, 
empty,  non-empty,  list  and  any  respectively  denoting  the  set  of  constants, 
the  empty  list,  the  set  of  non-empty  lists  the  set  of  aU  lists,  and  the  set  of 
all  values.  Environments  can  be  approximated  using  mappings  from  vari¬ 
ables  into  approximate  values.  For  example,  the  approximate  environment 
[Lt-> list,  const]  represents  the  set  of  environments  in  which  L  is  bound 
to  some  list  and  X  is  bound  to  some  constant.  To  analyze  Program  2 
using  this  approximation  of  environments,  note  that  at  point  (§),  the  envi¬ 
ronment  [L*-^a.b.nil,X>-^c]  is  approximated  by  [L*-ynon-empty,X>-^ const]. 
Propagating  this  information  to  point  @  yields  [L>-*^non-empty,Xi-*const], 
and  this  in  turn  gives  [L>-^ list,  X*^  any]  at  point  ©.  Further  propaga¬ 
tion  steps  lead  to  the  results  summarized  in  Figure  2.3.  Note  that  when 
[Li-^ non-empty, Xi-y any]  is  added  to  point  (g),  [L>->’list,X>-^ const]  is  deleted 
because  it  is  subsumed  by  [L>-^non-empty,  X>->’any]. 

In  summary,  abstract  interpretation  involves  replacing  the  underlying 
values  of  the  computation  by  a  collection  of  approximate  values  in  such  a 
way  that  program  analysis  can  be  performed  by  doing  “approximate”  com¬ 
putation  using  the  approximate  values.  The  choice  of  approximate  values 
determines  the  character  of  the  whole  analysis.  For  example,  the  analysis 
just  given  for  Program  2  was  not  particularly  interesting  because  the  collec¬ 
tion  of  approximate  values  used  was  very  simple.  More  accurate  analysis  is 
possible  if  extra  approximate  values  are  used.  For  example,  suppose  that  for 
each  natural  number  n,  an  extra  value  /en(n)  is  added,  denoting  the  set  of 
lists  of  length  n.  Analyzing  Program  2  using  this  new  collection  of  approx- 
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Point 

Actual  Environments 

Revised  Abst.  Int. 

® 

{[Xi-^a.6.m7,  A't-»c]} 

{[L>-ylen{2),Xy^  const]} 

J  [L>~ya.b.nil,  X>~yc]  1 
\  [Ly-yb.nil,  / 

(  [Lt-ylen(2),  X>-^const]  1 
[  [Ly-*len{\),  Xy-*any]  J 

© 

r  [Xt-^6.m7,  Xy-^a\  1 
\  [L>^nil,  X^b]  / 

r  [XH^/en(l),  Xy-^any]  1 
\  Xy-*any]  j 

® 

{[Ly^nil,Y>^b]} 

{lLt-^,Xt-yany]} 

Figure  2.4:  Revised  Abstract  Interpretation  of  Program  2 


L  :=  b.nil; 
while  true  do 
L  :=  u.Lj 

© 


Point 


© 


Environments 

Abst.  Int. 

{[i-/cn(l)]} 

(  [Ly-^b.nil] 

’  [XH^/en(l)]  ' 

j  [Ly-*a.h.nit] 

[L<^len{2)] 

* 

[it-»/en(3)] 

\  *  4 

? 

{} 

{} 

Figure  2.5:  Program  3  and  Its  Abstract  Interpretation 


imate  values  starts  by  approximating  the  environment  [Li-^a.b.nil,X>-*^c] 
at  point  @  by  [L>-^len(2),X>-^ const].  Repeatedly  propagating  this  infor¬ 
mation  successively  adds  [L>-*len{2),X*-*const]  to  (g),  [ii->/cn(l),A’»-»ony] 
to  ©,  [L>-^len{\),X>-*any]  to  (g),  {L>^empty,X>^any]  to  ©,  and  finally 
[Ly-^  empty  .,X>-^  any]  to  (g).  The  result  of  this  analysis  is  summarized  in 
Figure  2.4.  Note  that  for  some  program  points  there  are  multiple  abstract 
environments,  and  this  means  that  the  analysis  can  capture  some  informa¬ 
tion  about  the  dependencies  between  the  values  of  program  variables.  We 
shall  return  to  this  issue  later. 
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2.3  Limitations  of  Abstract  Interpretation 


Clearly  an  arbitrary  number  of  approximation  values  can  be  added,  and 
with  each  new  approximation  value,  the  ability  of  the  abstract  interpreta¬ 
tion  to  distinguish  between  sets  of  environments  is  enhanced,  and  so  the 
analysis  improves  in  accuracy.  However  a  fundamental  problem  arises:  as 
more  approximation  values  are  added,  the  exhaustive  propagation  becomes 
more  expensive  and  eventually  does  not  terminate.  For  exaimple,  consider 
Program  3,  which  appears  in  Figure  2.5.  Here  points  (§),  (g)  and  ©  respec¬ 
tively  indicate  the  points  just  after  the  assignment  L  :=  b.nil,  just  before 
execution  of  the  L  :=  a.L,  and  after  the  entire  while-do  loop.  The  collecting 
semantics  of  this  program  associates  an  iniinite  collection  of  environments 
with  point  ©.  Now,  when  this  program  is  analyzed  using  the  collection  of 
approximate  values  just  described,  the  following  propagation  steps  are  per¬ 
formed.  First,  the  environment  [Li-^b.nil\  is  approximated  by  [I<t-»/en(l)]  at 
point  @.  Propagating  this  information  to  point  ®  yields  [i/i-»/en(l)].  Sub¬ 
sequent  propagation  steps  lead  to  the  addition  of  [X)->/en(2)],  [XH-4^/en(3)], 
[Xi-»/en(4)], ...  to  point  ©.  This  propagation  process  does  not  terminate. 

Although  Program  3  is  somewhat  artificial,  since  it  is  easy  to  see  that 
the  loop  in  this  program  does  not  terminate,  the  situation  illustrated  by 
this  program  frequently  arises.  Typically  the  conditions  that  appear  in  a 
program  are  too  complex  to  analyze  exactly,  and  often  the  approximations 
used  are  tantamount  to  replacing  the  condition  with  true.  More  abstractly, 
program  analysis  must  effectively  deal  with  non-termination  and  infinite  col¬ 
lections  of  values  for  two  main  reasons.  First,  we  cannot  in  general  statically 
determine  whether  a  program  will  terminate.  Even  if  a  program  is  guaran¬ 
teed  to  terminate,  it  is  rarely  feasible  to  analyze  the  program  exactly,  and 
the  process  of  approximating  the  behavior  of  a  program  typically  introduces 
non-terminating  computation.  Second,  program  analysis  is  usually  carried 
out  in  some  initial  context  or  environment  that  involves  descriptions  of  in¬ 
finite  sets  of  values.  For  example,  we  may  wish  to  analyze  the  while  loop 
of  Program  2  in  the  context  where  L  is  an  arbitrary  list  consisting  of  the 
constants  a  and  6  and  A*  is  an  arbitrary  value. 

Since  the  treatment  of  infinite  sets  of  values  is  a  necessary  part  of  pro¬ 
gram  analysis,  a  fundamental  questions  arises:  how  can  the  exhaustive  prop¬ 
agation  process  be  made  to  terminate?  Clearly  if  there  are  only  a  finite 
number  of  approximate  values,  then  the  propagation  process  always  termi- 
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nates.  This  is  because  the  propagation  process  is  monotonic  -  each  step 
serves  to  increase  the  information  at  each  point  -  and  so  there  can  only  be  a 
finite  number  of  increments  of  the  information  at  each  point.  However  less 
restrictive  approaches  are  possible.  To  describe  these,  we  first  formulate  the 
exhaustive  propagation  process  as  an  iterative  least  fixed  point  computation. 

In  essence,  the  process  of  propagating  information  from  one  program 
point  to  another  can  be  characterized  as  a  function.  Specifically,  let  T  de¬ 
note  the  function  which,  when  given  an  association  of  information  with  pro¬ 
gram  points,  computes  the  result  of  updating  this  association  by  performing 
a  “single-step”  propagation  of  its  information.  The  exhaustive  propagation 
of  information  from  one  point  to  the  other  then  essentially  corresponds  to 
starting  with  the  association  ±  that  associates  the  empty  set  with  each  pro¬ 
gram  point,  and  then  exhaustively  applying  the  function  T  until  no  new 
information  is  obtained.  That  is,  the  exhaustive  propagation  process  corre¬ 
sponds  to  iteratively  computing  the  least  fixed  point  of  T  by  constructing 
the  sequence 

Note  that  this  correspondence  is  not  exact  since  the  repeated  application 
of  T  corresponds  to  a  specific  order  of  propagation  of  information  from  one 
point  to  another.  However  the  iterative  fixed  point  computation  provides  an 
important  characterization  of  the  termination  properties  of  the  exhaustive 
propagation  process  in  the  following  sense:  exhaustive  propagation  only 
terminates  when  the  iterative  fixed  point  computation  terminates. 

Now,  the  function  T  maps  from  and  into  associations  of  information  with 
program  points.  Let  V  denote  the  set  of  such  associations.  These  assoda- 
tions  can  be  ordered  according  to  the  amount  of  information  they  contain. 
The  association  ±,  which  assodates  the  empty  set  with  each  program  point, 
is  the  smallest  assodation.  More  generally,  if  Aj  and  A2  are  associations 
then  we  write  Ai  C  A2  when,  for  each  program  point,  the  information  of 
A2  is  at  least  that  of  Ai.  According  to  this  ordering,  .F  is  an  increasing 
function  in  the  sense  that  A  C  ^{A).  This  means  that  the  iterative  fixed 
point  computation  is  increasing  in  the  sense  that  ±C  C  C 

C  •  •  •.  Clearly  the  iterative  fixed  point  is  guaranteed  to  termi¬ 
nate  if  V  is  finite.  Termination  is  also  guaranteed  if  V  does  not  contain  any 
sequences  of  elements  of  the  form  Ai  C  A2Q  A3  C  such  that  each  A,-  is 
distinct  association.  That  is,  termination  is  guaranteed  if  V  does  not  have 
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L  :=  b.nil; 
while  true  do 


L  :=  a.L; 

© 


Point 

Environments 

Revised  Abst.  Int. 

{[£*->6.m/]} 

< 

[Ti->^o.6.ml]  1 
[Tt-t^a.a.h.m/]  | 

{[X.-^/cn+(l)]} 

© 

{} 

{} 

Figure  2.6:  Revised  Abstract  Interpretation  of  Program  3 


any  infinite  ascending  chains. 

As  an  example  where  V  is  infinite  but  has  no  infinite  ascending  chains, 
consider  the  abstract  interpretation  of  Program  3  using  approximate  values 
of  the  form  /ert+(n)  denoting  lists  of  length  n  or  more.  Again,  environments 
are  approximated  by  using  mappings  from  variables  into  approximate  val¬ 
ues.  The  set  V  of  associations  is  infinite,  but  does  not  contain  any  infinite 
ascending  chains,  and  so  iterative  fixed  point  computation  over  V  always 
terminates.  As  an  example,  the  analysis  of  Program  3  terminates,  and  is 
presented  in  Figure  2.6.  Note  that,  for  termination  reasons,  it  is  important  in 
this  example  to  maintain  the  information  at  each  point  in  a  non-redundant 
form.  Consider,  for  example,  the  situation  when  the  appronmate  envi¬ 
ronment  at  ($  is  [£)-*/en+(l)]  and  a  propagation  step  computes  the  "new” 
approximate  environment  [T)->len-i-(2)]  for  Since  this  new  information 
is  already  subsumed  by  [L)->[en-t-(l)],  the  information  at  (g)  need  not  be 
changed. 

In  short,  only  certain  collections  of  approximate  values  can  be  used  in 
an  abstract  interpretation.  Although  the  set  of  associations  V  need  not  be 
finite,  it  must  be  essentially  finite  in  the  sense  that  only  a  finite  number  of 
elements  of  V  are  visited  in  any  fixed  point  computation.  This  is  a  very  sig¬ 
nificant  restriction.  It  is  worth  noting  that  there  aire  methods  for  obtaining 
terminating  algorithms  even  when  the  collection  of  approximate  values  does 
not  satisfy  such  a  finiteness  criteria.  This  is  achieved  by  computing  some¬ 
thing  other  than  the  least  fixed  point  of  the  propagation  function  P.  One 
such  technique  is  widening  [13].  To  illustrate  widening,  consider  analyzing 
Program  3  using  the  approximate  values  of  the  form  len{n)  and  len+{n). 
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where  the  former  denotes  lists  of  length  n,  and  the  latter  denotes  lists  of 
length  at  least  n.  Now,  the  analysis  of  Program  3  using  these  approximate 
values  is  identical  to  the  analysis  outlined  in  the  table  of  Figure  2.5,  and 
does  not  terminate.  The  reason  is  that  the  propagation  process  successively 
adds  the  sequence  of  approximate  environments  [it->/cn(l)],  [L*-*len{2)] 
[L>-v/en(3)],  ...  to  point  (g).  In  essence,  the  effect  of  widening  is  to  avoid 
such  infinite  sequences  by  guessing  their  limit  point. 

Specifically,  let  I  denote  the  information  associated  with  a  program 
point,  and  suppose  that  the  propagation  process  determines  that  this  in¬ 
formation  should  be  updated  with  the  new  information  V.  Normally,  the 
information  for  the  program  point  is  updated  to  /  U  (for  the  above  anal¬ 
ysis  of  Program  3,  this  is  obtained  by  taking  the  union  of  the  approximate 
environments  in  I  and  V  and  then  removing  any  redundant  approximate 
environments).  However  in  widening,  the  program  point  is  updated  with 
/V/',  where  V  is  a  function  that  approximates  the  combination  of  I  and  V. 
For  example,  an  appropriate  V  for  analysis  of  Program  3  would  inspect  I 
and  I'  and  check  to  see  if  there  are  apprcndmate  environments  of  the  form 
[L^-*l€n{i)]  €  I  and  [L*~*len(j)]  €  such  that  t  <  j.  If  so,  then  the  envi¬ 
ronment  in  /'  wotild  be  replaced  by  before  joining 

I'  with  I.  This  means  that  the  analysis  of  Program  3  proceeds  as  follows: 


•  Initially  the  environment  [Lf-*b.nH\  is  approximated  by  [2/)->^len(l)]  at 
point  (§). 

•  Propagate  from  ®  to  update  (g)  to  {[Zii-»/cn(l)]}. 

•  Propagate  from  (g)  to  (g):  this  yields  the  new  approximate  environment 
{[Lt-^len(2)]}  at  (g),  and  so  widening  is  used  to  obtain 
{[Lt->/cn(l)]}V{[Lt->fen(2)]}  =  {(L*-»ten+(l)]}  for  (g). 


Intuitively,  the  intent  of  V  is  to  determine  if  /  and  I'  form  the  start  of  a 
possible  infinitely  increasing  sequence  of  apprcodmate  environment  sets,  and 
if  so,  to  "round-up”  to  some  appropriate  set  of  environments.  Of  course, 
this  "rounding-up”  could  overshoot  the  limit  of  the  sequence,  and  cause  the 
computation  to  return  something  other  thw  the  least  fixed  point  of  To 
see  this,  consider  the  Program  4  in  Figure  2.7.  This  program  is  essentially 
the  same  as  Program  3  except  that  it  terminates  after  two  iterations  of 
the  while-do  loop.  Now,  using  the  same  approximate  values  as  those  just 
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L  :=  b.nil; 

® 

while  len{L)  ^  3  do 


L  :=  a.L; 


© 


Point 

Environments 

® 

{[Lt-^b.nil\} 

(  [L>-*b.niI\  1 
^  \L*-*a.b.niI\  j 

© 

{[L»-fa.b.ni/]} 

Figure  2.7:  Program  4  and  Its  Collecting  Semantics 


Point 

Environments 

Widening 

No  Widening 

® 

{(Ih^Mi)]} 

{[i,H^/cn(l)]} 

r  [L*-^b.nil\  1 

1  [JDt-*’a.6.m/]  J 

{[Ii-^-ten+(l)]} 

f  [i— ten(l)]  \ 

1  [I'-+ten(2)]  / 

© 

{[Xi-»a.5.ni/]} 

{[iK./cn(3)]} 

Figure  2.8:  Abstract  baterpretation  of  Program  4 


used  to  analyze  Program  3,  and  using  the  same  widening  operator  V,  the 
analysis  of  Program  4  proceeds  identically  to  that  of  Program  3,  and  the 
result  of  the  analysis  appears  in  the  second  column  of  the  table  in  Figure 
2.8.  The  analysis  without  widening  terminates  in  this  case,  and  its  result 
is  given  in  the  third  column  of  the  table.  For  this  example  the  widening 
operator  over-approximates  when  the  approximate  environment  [X)-f/en(2)] 
is  added  to  point  (g),  and  hence  the  analysis  does  not  yield  the  least  iixed 
point  of  F,  but  rather  some  arbitrary  approximation  of  it.  A  complementary 
technique  of  narrowing  has  been  developed  that  partly  compensates  for  the 
over-approximation  inherent  in  widening  [13],  however  this  is  not  sufficient 
to  regain  the  least  fixed  point. 

In  summary,  abstract  interpretation  requires  that  the  collection  of  ap¬ 
proximate  values  be  essentially  finite  in  character  to  ensure  termination  of 
the  iterative  fixed  point  computation.  The  techniques  of  widening  and  nar¬ 
rowing  can  be  used  to  address  this  restriction,  but  at  the  cost  of  introducing 
extra  approximation  -  that  is,  widening  introduces  another  level  of  approx¬ 
imation  over  and  above  that  introduced  by  the  use  of  approximate  values. 
Due  to  the  ad  hoc  nature  of  widexung  and  narrowing,  it  is  difficult  to  give  a 
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formal  characterization  of  the  restrictions  on  the  collection  of  approximate 
values  necessary  for  termination.  However  the  following  general  observation 
can  be  made:  abstract  interpretation  (with  or  without  widening  and  nar¬ 
rowing)  explicitly  constructs  a  succession  of  elements  from  V  in  order  to  find 
the  least  fixed  point  of  T  (or  some  apprmdmation  thereof),  and  this  succes¬ 
sion  of  elements  must  be  finite.  This  implies  that  only  a  finite  part  of  the 
function  space  of  T  can  be  investigated  during  this  fixed  point  computation. 

This  observation  has  two  important  implications.  First,  the  finitary  na¬ 
ture  of  abstract  interpretation  implies  that  there  is  a  fundaunental  limitation 
on  the  accuracy  of  this  approach  to  program  analysis.  There  are  decidable 
kinds  of  analysis  that  cannot  be  computed  using  abstract  interpretation 
(even  with  widening  and  narrowing).  The  set  based  analysis  considered  in 
this  thesis  is  one  example. 

Second,  the  finitary  nature  of  abstract  interpretation  means  that  there 
are  typically  very  subtle  interactions  between  the  collection  of  approximate 
values  used  and  the  operations  of  the  language  being  analyzed.  This  fre¬ 
quently  leads  to  chaotic  and  unintuitive  behavior.  In  particular,  it  is  often 
very  difficult  for  a  programmer  to  determine  what  an  abstract  interpretation 
based  analysis  will  yield. 

For  example,  consider  Program  5  in  Figure  2.9,  which  flattens  out  and 
reverses  the  input  list  L  so  that,  on  termination,  FL  is  4.3.2.I. nil.  Sup¬ 
pose  that  the  basic  values  of  this  program  are  integers,  characters  and  lists. 
Consider  an  abstract  interpretation  of  this  program  in  which,  corresponding 
to  the  basic  values,  there  are  approximate  values  int,  char  and  atomic  re¬ 
spectively  denoting  the  sets  of  integers,  characters  and  non-list  values.  Also 
add  a  family  of  approximate  values  list{a)  where  a  ranges  over  approximate 
values.  For  example,  list{int)  and  list{list{int))  are  both  approximate  val¬ 
ues  with  the  obvious  meanings.  Now,  we  might  expect  that  the  analysis 
of  Program  5  using  this  collection  of  approximate  values  might  lead  to  the 
approximation  of  FL  at  point  ©  by  list{int).  However  this  is  not  the  case 
because  at  point  (g)  the  program  variable  X  is  in  general  bound  to  a  list 
whose  elements  are  integers  and  integer  lists.  Since  there  is  no  approximate 
value  corresponding  to  such  lists,  L  must  be  apprcBcimated  by  list  at  this 
point,  and  it  follows  that  the  best  that  can  be  obtained  for  FL  at  point 
©  is  list{atomic).  In  essence,  the  problem  relates  to  the  intermediate  val¬ 
ues  computed  by  a  program.  Even  though  this  collection  of  approximate 
values  is  sufficiently  expressive  to  represent  the  results  of  a  computation,  it 
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L  :=  (1.2.m/).(3.4.nt/); 

FL  :=  nil; 

while(£  ^  nil)  do 

if  car(L)  =  nil  then 
L  :=  cdr(L); 

else  if  islist(car(L))  then 

L  :=  car{car{L)).odr{car{L)).edr{L); 
else 

FL  :=  car(L).FL; 

L  :=  cdr(L); 

© 


Figure  2.9:  Program  5 

is  not  sufficiently  expressive  to  represent  intermediate  parts  of  the  compu¬ 
tation.  Such  a  lack  of  uniformity  is  unavoidable  in  abstract  interpretation 
approaches.  Program  5  is  a  very  simple  program  and  it  is  quite  easy  to  iden¬ 
tify  why  the  expected  result  was  not  obtained.  However,  in  large  programs 
this  is  not  usually  possible.  These  dehdencies  of  abstract  interpretation  - 
lack  of  uniformity,  predictability  and  stability  -  are  particularly  relevant  for 
scaling  up  abstract  interpretation  based  approaches  to  large  systems. 

To  address  these  deficiencies,  this  thesis  seeks  an  approach  to  program 
analysis  that  is: 


•  declarative:  we  desire  a  simple  definition  of  approximation  that  has 
an  intuitive  relationship  to  program  meaning  and  is  independent  of 
algorithmic  considerations; 

•  accurate:  the  approximation  must  be  meaningful  for  program  amaly- 
sis;  and 

•  decidable:  there  must  be  algorithms  to  compute  the  approximation. 
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2.4  Approximation  of  Values  and  Variables 


At  the  heaxt  of  program  analysis  is  the  notion  of  program  approximation. 
The  example  abstract  interpretations  given  in  the  previous  two  sections  have, 
for  simplicity  of  presentation,  focussed  on  approximation  of  the  underlying 
values  of  computation.  For  example,  in  the  analysis  of  Program  1,  the 
integers  were  approximated  using  pos,  neg  and  int.  In  the  analysis  of  Pro¬ 
gram  2,  atomic  constants  and  lists  were  approximated  using  const,  empty, 
non-empty,  list  and  any.  These  approximate  values  were  then  used  to  ap¬ 
proximate  environments  by  simply  considering  mappings  from  variables  into 
approximate  values.  What  was  not  considered  in  these  previous  examples 
was  the  possibility  for  introducing  approximation  in  the  treatment  of  envi¬ 
ronments.  For  example,  given  a  collection  of  approximate  values  such  as  pos, 
neg  and  int,  one  can  either  approximate  environments  by  using  collections 
of  mappings  from  variables  into  approximate  values,  or  one  can  approximate 
environments  using  a  single  mapping  from  variables  into  approximate  values. 
The  essential  difference  between  these  approaches  is  that  the  former  captures 
some  dependencies  between  possible  variable  values,  whereas  the  latter  ig¬ 
nores  all  dependencies  between  variable  values.  Consider  approximating  the 
environments  {[A’K.»”l,yi->l],[Jr»-*3,yH*~3]}.  Using  the  former  approach, 
this  is  approximated  by  {[X^neg,Y>-*pos\,[X*-*pos,Y^neg]}.  Using  the 
latter  approach,  it  is  approximated  by  {[A’i-»mt,y)-»mt]}.  It  is  also  possible 
to  extend  the  ability  to  represent  variable  interdependencies  further  and,  for 
example,  approximate  these  environments  by  the  formula  X  =  ~Y. 

In  other  words,  approximation  can  be  introduced  in  two  ways:  in  the 
treatment  of  the  underlying  values,  and  in  the  treatment  of  variables.  Ap¬ 
proximation  may  be  introduced  in  the  treatment  of  values  through  the  use 
of  approximate  (or  abstract)  values  that  are  essentially  finite  descriptions 
of  sets  of  program  values.  Intuitively,  approximation  in  this  case  appears 
when  set  of  values  are  “rounding  up”  to  the  nearest  approximate  value  dur¬ 
ing  the  abstract  interpretation  process.  In  contrast,  approximation  may  be 
introduced  in  the  treatment  of  variables  by  forgetting  some  of  the  depen¬ 
dencies  that  arise  in  the  treatment  of  variables.  To  illustrate  the  kinds  of 
dependencies  that  may  arise,  consider  again  the  question  of  representing 
{[A'i->~l,yi->l],[A't-»3,yi-»“3]}.  As  noted  previously,  we  could  choose  to 
represent  these  environments  using  {[X*-*neg,Y>-*pos],[Xt-*pos,Y>-*neg]} 
(which  captures  some  dependencies  between  X  and  y)  or  {[Jfi-»tnt,y)-fmt]} 
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(which  ignores  all  dependencies  between  X  and  Y).  As  another  example  of 
dependencies  introduced  through  the  treatment  of  variables,  consider  the 
program  statement  Y  :=  pair(X,X).  When  such  a  statement  is  analyzed, 
there  is  a  choice  of  how  the  two  occurrences  of  X  are  to  be  treated.  For 
example,  if  the  approximate  environment  before  execution  of  this  statement 
maps  X  into  pos,  then  we  could  represent  the  result  for  Y  by  pair(pos,pos) 
indicating  the  y  is  a  pair  whose  first  and  second  components  are  positive 
numbers,  or  by  “pair(t>,  t?)  A  v  €  pos”  indicating  a  dependency  between  the 
arguments  of  pair.  We  shall  collectively  refer  to  all  dependencies  that  may 
be  introduced  by  variables  as  inter-variable  dependencies.  We  remark  that 
the  notion  of  inter-variable  dependency  was  first  considered  by  Jones  and 
Muchnick  [33]],  who  used  the  term  independent  attribute  analysis. 

Note  that  the  distinction  between  approximations  of  values  and  approxi¬ 
mations  of  inter- variable  dependencies  is  sometimes  obscured  by  interactions 
between  inter -variable  dependencies  and  values.  For  example,  the  environ¬ 
ments  {(.Yi-  -“l,yi-»l],[A'*-»3,y»-»~3]}  could  be  approximated  by  the  for¬ 
mula  (X  =~  Y)  A  ((X  =  y  —  2)V(A’  =  y-|- 6)),  and  even  though  there 
may  be  some  underlying  approximation  of  values,  the  expressive  power  of 
the  inter-variable  dependencies  mechanism  allows  the  actual  values  of  X  and 
y  to  be  completely  recovered,  and  so,  in  effect,  there  is  no  approximation 
of  the  values. 

Most  program  analysis  algorithms  incorporate  approximation  of  the  un¬ 
derlying  values  as  well  as  approximation  of  inter-variable  dependencies.  (For 
efficiency  reasons,  analyzers  often  completely  omit  reasoning  about  inter¬ 
variable  dependencies  [33].)  A  fundamental  question  is:  what  are  the  min¬ 
imal  notions  of  approximation  required  for  the  decidability  of  the  resulting 
analysis?  Some  approximation  of  inter-variable  dependencies  is  necessary 
because  exactness  on  such  dependencies  essentially  gives  exact  program  anal¬ 
ysis.  What  has  not  been  addressed  is  whether  or  not  program  analysis  is 
decidable  when  ignoring  inter-variable  dependencies  is  the  only  approxima¬ 
tion  used.  Furthermore,  can  this  be  used  as  the  basis  of  a  practical  program 
analysis  system?  We  will  show  that  ignoring  inter-variable  dependencies  is 
sufficient  for  decidability  and  can  be  used  for  practical  program  analysis. 
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2.5  Set  Based  Analysis 


The  set  based  approach  to  program  analysis  has  its  origins  in  the  use  of 
constraints  to  perform  type  analysis  of  programs  [32,  48,  63].  In  essence,  set 
based  analysis  involves  first  writing  set  constraints  (a  calcidus  for  express¬ 
ing  relationships  between  sets  of  program  values)  to  describe  the  run-time 
behavior  of  a  program,  and  then  solving  these  constraints  to  find  their  most 
accurate  solution.  The  fundamental  difference  between  set  based  analysis 
and  abstract  interpretation  approaches  is  that  set  based  analysis  does  not 
use  an  iterative  fixed  point  computation  over  an  (essentially  finite)  collection 
of  approximate  values.  In  particular,  there  are  no  depth  bounds  or  other  a 
priori  restrictions  on  the  sets  of  values  that  can  be  manipulated  during  the 
analysis. 

If  one  takes  a  very  broad  view  of  abstract  interpretation  as  a  framework 
for  defining  program  approximations  (as  opposed  to  the  more  algorithmic 
iterative  fixed  point  view),  then  set  based  analysis  can  be  formulated  as  an 
abstract  interpretation.  However,  the  corresponding  iterative  fixed  point 
computations  do  not  terminate,  and  so  iterative  fixed  point  computation 
cannot  be  used  in  set  based  analysis.  One  of  the  main  issues  addressed  by 
this  thesis  is  the  development  of  algorithms  to  show  that  set  based  analysis 
is  decidable. 


Set  Constraints 

Consider  analyzing  a  program  in  such  a  way  that  inter- variable  dependencies 
are  ignored.  That  is,  we  wish  to  avoid  using  reasoning  such  as  "variable  X 
takes  value  a  iff  variable  Y  takes  value  6,  and  X  takes  value  c  iff  T  takes  value 
<r.  Instead,  we  wish  to  reason  about  the  program  by  considering  the  sets  of 
values  that  each  program  variable  can  assume,  and  ignoring  dependencies 
between  the  elements  of  these  sets.  In  essence,  program  variables  are  treated 
as  sets,  and  this  is  the  motivation  for  describing  such  analysis  as  set  based 
analysis.  Specifically,  for  each  program  variable  X  and  progreun  point  /x, 
we  introduce  a  set  variable  to  denote  the  set  of  values  of  the  program 
variable  X  at  the  point  p.  Then,  by  inspecting  the  program,  we  construct 
constraints  between  these  set  variables  to  capture  the  relationships  between 
program  variables  that  are  contained  in  P.  We  shall  now  illustrate  how  these 
constraints  may  be  constructed.  Note  that  the  constraints  demonstrated  in 
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this  section  are  only  suggestive  of  how  this  may  be  done,  and  the  actual  set 
constraints  used  in  set  based  analysis  are  somewhat  more  complicated  and 
more  accurate. 

Consider  again  the  imperative  program  in  Figure  2.10;  fo'^  clarity,  the 
following  discussion  uses  cons  for  lists  rather  than  the  infix  notation. 
Introduce  set  variables  to  denote  the  values  of  each  variable  at  each  pro¬ 
gram  point.  For  example,  let  and  respectively  be  the  set  variables 
corresponding  to  L  and  X  at  program  point  (g)  (that  is,  just  before  execu¬ 
tion  of  statement  4),  and  let  CP  and  X®  respectively  be  the  set  variables 
corresponding  to  L  and  X  at  point  ©.  Now  corresponding  to  statement  4, 
X  :=  car(L),  consider  the  following  constraints: 

D  £® 

AP  D  car(£® ). 

The  first  constraint  specifies  that  the  values  for  L  after  execution  of  state¬ 
ment  4  must  include  all  those  before  the  execution  of  statement  4.  The  sec¬ 
ond  constraint  specifies  that  the  values  for  X  after  execution  of  statement  4 
must  include  the  car  of  values  of  L  before  statement  4.  The  symbol  car  in 
this  last  constraint  denotes  the  set-wise  version  of  the  program  symbol  car. 
Specifically,  where  5  is  a  set  of  values,  car{S)  denotes  {wi  :  con5(vi,U2)  €  5}. 
Note  that  these  constraints  could  have  been  expressed  using  set  equality 
rather  containment.  However,  the  use  of  containment  yields  constraints 
that  express  a  minimal  notion  of  consistency  between  the  sets  at  each  pro¬ 
gram  point.  It  also  simplifies  the  treatment  of  converging  paths  of  control 
flow.  We  shall  return  to  this  issue  later,  but  first  we  give  the  rest  of  the 
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Figure  2.11:  Set  Constraints  for  Program  2 
constraints  for  the  program. 

Consider  statement  5.  Introduce  set  variables  and  X®  to  describe 
the  values  of  L  and  X  at  point  ©,  and  construct  the  following  constraints: 

C®  D  cdr(£© ) 

X®  D  X® 

The  hrst  constraint  specifies  that  the  values  for  L  after  statement  5  must 
include  the  car  of  all  values  for  L  before  statement  4,  and  the  second  specifies 
that  the  values  for  X  after  statement  5  must  include  all  values  for  X  before 
the  statement. 

Adding  constraints  for  the  remaining  statements  leads  to  the  following 
collection  of  constraints  in  Figure  2.11,  where  the  symbol  nil  in  the  set 
constraints  denotes  the  singleton  set  of  values  {nt/},  nil  denotes  the  set 
of  all  values  different  from  nil,  and  O  has  its  usual  set  theoretic  meaning. 
Note  the  treatment  of  the  while-do  statement.  For  example,  the  constraint 
C®  D  £®  n  nil  corresponds  to  the  possibility  of  flow  of  control  from  point  ® 
to  point  (g),  and  states  that  the  values  for  L  a,t($  must  contain  the  L  values 
at  ®  that  are  different  from  nil.  Similarly  the  constraint  £®  D  jC®  n  nil 
corresponds  to  the  possibility  of  flow  of  control  from  point  ©  to  point  ©. 

These  constraints  represent  an  approximation  of  the  relationships  im¬ 
plicit  in  the  imperative  program.  They  are  conservative  in  the  sense  any 
assignment  of  sets  to  the  set  variables  that  satisfies  each  constraint  is  guar¬ 
anteed  to  contain  all  of  the  possible  values  encountered  at  run-time.  It  is 
therefore  natural  to  consider  the  least  such  assignment  of  sets  because  this 
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yields  the  most  accurate  information.  For  the  constraints  in  figure  2.11,  the 
least  assignment  of  sets  that  satisfies  the  constraints  is 


{cons(a,  cons(b,  m7))} 

X® 

{c} 

£® 

1— ► 

{cons(a,  cons(b,  nil)),  cons{b,  nil)} 

X® 

H- ► 

{a,6,c} 

£© 

1— ► 

{cons(a,  cons(b,  nil)),  cons(b,  nil)} 

X® 

{a,  6} 

1— > 

{cons(5,  nil),  nil} 

X® 

{0,5} 

£® 

{nil} 

X® 

{a,6,c} 

This  clearly  represents  an  approximation  of  the  values  that  are  encountered 
at  run-time.  For  example,  at  point  (g),  the  only  value  encountered  at  run¬ 
time  for  X  is  6,  but  this  is  approximated  here  by  {a,6,  c}.  The  reason  for 
this  approximation  is  the  dependencies  between  L  and  X  have  been  ignored, 
so  that  the  relationship  “T  takes  value  nil  iff  X  takes  value  6”  is  ignored.  In 
general,  we  shall  refer  to  an  assignment  of  sets  to  set  variables  that  satisfies  a 
collection  of  constraints  as  a  model  of  the  constraints.  The  above  assignment 
is  the  least  model  of  the  constraints  in  Figure  2.11. 

As  noted  earlier,  the  constraints  we  have  used  for  modeling  programs 
could  have  been  expressed  using  set  equality  rather  than  containment.  For 
example  the  equality 

r®  =  (£®  nmj)  U  (£®nmJ) 

could  be  used  in  the  place  of  the  two  constraints  2  H  nil  and 
£®  3  f|  nil  in  Figure  2.11.  However,  the  use  of  containment  has  a 
number  of  advantages.  First,  it  yields  constraints  that  have  a  more  intu¬ 
itive  reading  -  they  correspond  to  minimal  consistency  relationships  between 
the  sets  at  each  program  point.  The  use  of  equality  is  in  some  sense  an 
over-specification  of  the  relationships  inherent  in  the  program.  (One  mi¬ 
nor  advantage  of  equality  is  that  it  reduces  the  number  of  models  of  the 
constraints,  but  note  that  the  constraints  still  would  not  define  a  unique 
model.)  Second,  the  use  of  containment  simplifies  the  construction  of  the 
constraints  because  it  allows  constraints  to  be  constructed  on  a  statement 
by  statement  basis,  rather  than  having  to  predetermine  the  possible  paths 
of  control  into  a  program  point.  Third,  the  correctness  of  the  set  constraints 
(and  the  justification  that  they  embody  a  natural  and  intuitive  notion  of  ap¬ 
proximation)  employs  environment  constraints,  which  are  like  set  constraints 
except  that  they  describe  relationships  between  sets  of  environments  at  each 
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pif(X,Y))^q(X,Y).  pifiX,Y))  ^  ©,  qiX,Y),  ©. 

g(a,6).  q(a,b)*-®. 

q{c,d).  q(c,d)*-®. 

Figure  2.12:  Program  6  (With  and  Without  Program  Points) 

program  point  instead  of  between  sets  of  variable  values.  The  use  of  con¬ 
tainment  rather  than  equality  in  these  environments  constraints  is  crucial 
for  defining  notions  of  program  approximation.  Since  the  set  constraints 
are  derived  from  these  environment  constraints,  the  use  of  containment  in 
set  constraints  is  a  matter  of  consistency  and  convenience.  Fourth,  the  use 
of  containment  simplifies  the  presentation  of  the  algorithms  for  solving  set 
constraints  because  it  allows  the  form  of  the  constraints  to  be  significantly 
simplified. 

Note  that  for  convenience  of  presentation,  certain  aspects  of  the  behav¬ 
ior  of  the  program  have  been  omitted  from  the  constraints  in  figure  2.11. 
The  main  omission  is  the  treatment  of  program  errors.  For  example,  follow¬ 
ing  a  statement  such  as  X  :=  car(L),  the  values  for  L  must  be  of  the  form 
cons{-  ‘  •),  because  otherwise  an  error  must  have  occurred  during  the  execu¬ 
tion  of  car(L).  The  more  accurate  set  constraints  described  in  the  body  of 
this  thesis  shall  take  into  account  such  reasoning.  Implicit  in  this  reasoning 
is  an  assumption  that  after  a  computation  encounters  an  error  condition,  we 
can  ignore  the  remainder  of  its  execution.  Specifically,  it  is  assumed  that  ei¬ 
ther  (i)  when  such  an  error  occurs  the  program  aborts,  and  so  control  never 
reaches  the  point  following  the  statement  X  :=  car(L)  with  the  non- cons  L 
value,  or  (ii)  if  an  error  occurs,  then  we  are  not  interested  in  the  subsequent 
execution  of  the  program,  and  so  it  is  permissible  for  the  analysis  to  be  “un¬ 
safe”  with  respect  to  executions  because  any  compiler  optimizations  made 
on  the  basis  of  such  information  can  only  alter  the  behavior  of  a  program 
after  an  error  has  occurred. 

We  now  show  how  constraints  may  be  used  to  analyze  logic  programs. 
Consider  the  logic  program  in  Figure  2.12.  Figure  2.12  also  contains  a  ver¬ 
sion  of  this  program  annotated  with  program  points  (§),  ®,  ©,  ©,  ©  and  ©. 
The  points  ©,  ©,  ©  and  ©  represent  the  points  at  the  end  of  the  execution 
of  the  goal  or  ride  in  which  they  appear.  The  points  (§)  and  ©  denote  the 
points  just  before  execution  of  p(f{X,Y))  and  q{X,Y)  respectively.  Sup- 
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pose  that  we  wish  to  analyze  this  program  to  determine  the  set  of  ground 
instances  of  the  bindings  for  each  program  variable  during  a  PROLOG  style 
top-down  left-to-right  execution  of  the  program.  As  before,  the  first  step 
in  the  construction  of  the  constraints  is  the  introduction  of  set  variables 
X® ,  y® ,  X® ,  y® , to  collect  the  values  for  each  program  variable  at  each 
program  point.  Four  new  set  variables  CaUp,  Callq,  Retp  and  Retg  are  also 
introduced.  To  explain  the  purpose  of  these  variables,  recall  that  top-down 
left-to-right  execution  of  a  program  involves  repeatedly  applying  the  fol¬ 
lowing  step:  inspect  the  left-most  atom  of  the  goal  and  choose  a  (suitably 
renamed)  program  riile  such  that  the  left-most  goal  atom  and  the  head  of 
the  rule  unify  and  let  their  most  general  unifier  be  0,  replace  the  left-most 
goal  atom  with  the  body  of  the  rule,  and  apply  0  to  the  resulting  goal.  In 
essence,  the  left- most  goal  atom  acts  like  a  procedure  “call”.  Such  a  call 
is  completed  (or  “solved”)  when  all  of  the  subgoals  introduced  by  the  call 
have  themselves  been  completed,  and  this  is  analogous  to  a  procedure  re¬ 
turn.  Now,  the  variables  Callp  and  Callq  respectively  correspond  to  the 
ground  instances  of  the  calls  made  to  the  predicates  p  and  q  during  pro¬ 
gram  execution,  and  Retp  and  Retq  respectively  correspond  to  the  ground 
instances  of  the  returns  involving  p  and  q  during  program  execution. 

To  illustrate  how  constraints  are  constructed,  consider  the  second  rule 
of  the  logic  program.  In  essence,  this  rule  says  that  one  way  to  solve  a  call 
of  the  form  p(/(A’,y))  is  by  calling  q{X,Y).  Hence,  the  values  for  variable 
X  at  point  ©  (just  before  the  calling  of  g(A’,y))  are  those  values  of  X 
such  that  p{f{X, . . .))  is  an  element  of  Callp,  and  this  can  be  written  as  the 
constraint 

X®  d{X:3Y  (p(f(X,Y))  €  Callp)}. 

At  point  ©,  the  value  of  X  must  be  such  that  p(f(X,  •  •  •))  is  an  element  of 
Callp  and  q(X,Y)  is  an  element  of  Retq,  and  this  can  be  written  as 

X®  D{X  :3Y(p{f(X,Y))€  Callp  A  qiX,Y)  £  Retq)}. 

This  rule  also  contributes  to  the  sets  Callq  and  Retp.  Specifically,  the  body 
of  the  rule  initiates  a  call  to  q,  and  the  rule  as  a  whole  may  be  used  to  solve 
a  call  to  p.  This  leads  to 

Callq  D  q(X®,  Y®) 

Retp  D  pif(X®,  y®)) 
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Callp 

2 

y®)) 

Call, 

2 

qix®,  y©) 

Retp 

2 

Pifix®  y®)) 

Retg 

2 

q(a,b) 

Retg 

2 

q{c,d) 

X® 

2 

T 

y® 

2 

T 

X® 

2 

{X:aY(p(fiX,Y))€Callp)} 

ya 

2 

{Y:3Xipif{X,Y))€Callp)} 

X® 

2 

{X:^Yip(f(X,Y))€Callp)} 

y© 

2 

{Y:3Xip(fiX,Y))€CaUp)} 

X© 

2 

{X:SYipinX,Y))eCallp  A 

,(x,y)  e  Rei,)} 

y© 

2 

{Y:3X(p(fiX,Y))eCallp  A 

5(X,r)  €  «el,)} 

Figure  2.13:  Set  Constraints  for  Program  6 


The  complete  constraints  for  the  logic  progrzun  appear  in  Figure  2.13, 
in  which  the  symbol  T  denotes  the  set  of  all  values.  Note  that  there  are  no 
constraints  for  points  and  0  because  there  are  no  variables  in  the  rules 
in  which  these  points  appear.  The  least  assignment  of  sets  to  set  variables 
that  satisfies  the  constraints  is  given  by: 


Ca/lp 

Call, 

Retp 

Retg 

x<s> 


{?(/(«!  »52)) :  Si  and  S2  are  values} 

{9(si,S2)  :  Si  and  S2  are  values) 

{p(/(a,  f>)),P(/(a,  d))yp(/(c,  b)),p(f{c,  d))} 

{q(a,  6),  q(a,  d),q(c,  6),  ?(c,  d)} 

{s  :  s  is  a  value)  3^  {5  :  s  is  a  value) 

{a,c}  y®  ^  {6,d) 

{s  :  s  is  a  value)  3^  »-»•  {s  :  5  is  a  value). 
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Solving  Set  Constraints 

The  purpose  of  set  constraints  is  to  capture  consistency  conditions  between 
set  variables  in  such  a  way  that  the  rdationships  between  the  variables  in 
a  program  are  safely  approximated.  In  other  words,  the  constraints  are 
constructed  so  that  any  assignment  to  the  set  vairiables  that  satisfies  the 
constraints  is  correct  in  the  following  sense:  if  is  the  set  variable  cor¬ 
responding  to  program  variable  X  at  point  fi,  and  if  program  variable  X 
assumes  value  v  at  point  /x  during  some  program  execution,  then  v  appears 
in  the  set  assigned  to  Although  any  model  of  the  constraints  yields 
correct  information,  the  least  modd  (which  is  guaranteed  to  exist)  is  pre¬ 
ferred  because  it  is  the  most  accurate  and  has  a  canonical  definition.  Thus, 
the  problem  of  analyzing  a  program  can  be  reduced  to  computing  the  least 
model  of  a  collection  of  set  constraints. 

Since  the  least  model  of  such  constraints  may  be  infinite,  computing  this 
model  entails  constructing  a  representation  of  a  potentially  infinite  object. 
Moreover,  for  such  a  representation  to  be  useful,  it  must  be  explicit  in  the 
sense  that  the  structure  of  the  model  is  self-evident  and  questions  relating 
the  membership  and  non-emptiness  can  easily  be  answered.  A  key  result 
of  this  thesis  is  that  the  sets  assigned  to  variables  in  the  least  model  of  a 
collection  of  set  constraints  are  regular  sets  of  terms  in  the  sense  that  they 
can  be  described  by  regular  term  grammars.  Regular  term  grammars  are  a 
generalization  of  regular  grammars  to  terms.  For  example,  the  regular  term 
grammar 

L  =>  nU 

L  =>  <x}ns{l,L) 

defines  the  set  of  all  lists  of  I’s.  Regular  term  grammsus  form  the  core  of 
our  explidt  representation  of  models.  Spedfically,  the  algorithm  presented 
in  this  thesis  for  solving  set  constraints,  inputs  a  collection  of  constraints 
constructed  from  a  program,  and  outputs,  for  each  set  variable  appearing 
in  the  constraints,  a  regular  term  grammar  describing  of  the  set  of  terms 
assigned  to  that  variable  in  the  least  model  of  the  constraints. 

In  summary,  the  process  of  constructing  set  constraints  from  a  program 
is  essentially  just  setting  up  the  analysis  problem,  and  is  analogous  to  writing 
out  the  definition  of  the  semantic  operator  .F  in  an  abstract  interpretation 
based  analysis.  The  real  work  of  analyzing  the  program  is  carried  out  by  the 


30 


CHAPTER  2.  INTRODUCTION 


algorithm  to  construct  a  representation  of  the  least  model  of  the  constraints. 
This  latter  process  of  solving  the  constraints  takes  the  place  of  the  iterative 
hxed  point  computation  in  an  abstract  interpretation  style  analysis.  Note 
that  in  abstract  interpretation  style  analysis,  the  use  of  constraints  is  some¬ 
what  optional  -  although  the  constrmnts  are  always  implicitly  present,  they 
do  not  have  to  be  explicitly  constructed.  On  the  other  hand,  set  based  anal¬ 
ysis  places  a  greater  emphasis  on  constraints.  This  is  mainly  because  the 
algorithm  for  solving  the  constraints  is  not  based  on  the  notion  of  locally 
propagating  information  from  one  program  point  to  another,  but  rather  the 
algorithm  reasons  about  the  program  as  a  whole.  Such  reasoning  is  most 
conveniently  carried  out  using  constraints. 


2.6  Overview  of  Thesis 


This  thesis  is  structured  in  three  parts.  Part  I  deals  with  the  definition  of  set 
based  analysis.  It  justifies  the  process  of  constructing  set  constraints  from 
a  program  that  has  been  informally  outlined  in  the  previous  section,  and 
shows  how  these  constraints  correspond  to  reasoning  about  a  program  by 
ignoring  inter-variable  dependencies.  Starting  with  an  operational  seman¬ 
tics,  we  define  the  notion  of  collecting  semantics.  This  collecting  semantics 
is  defined  directly  in  terms  of  the  operational  semaintics  by  simply  collect¬ 
ing  together  the  appropriate  objects  (environments  or  term  equations)  for 
each  program  point  encountered  during  program  execution.  Although  such 
a  definition  of  collecting  semantics  is  simple  and  natural,  it  sheds  little  light 
onto  how  collecting  semantics  may  be  approximated  and  computed.  The 
next  step  is  therefore  a  constraint  formulation  of  the  collecting  semantics. 
Given  a  program,  we  show  how  environment  constraints  may  be  constructed 
such  that  the  least  model  of  the  environment  constraints  corresponds  to  the 
program’s  collecting  semantics.  The  main  advantage  of  the  environment 
constraints  is  that  they  can  be  re-interpreted  in  a  number  of  different  ways. 
Such  an  alternative  interpretation  is  used  to  define  set  based  program  ap¬ 
proximation.  In  essence,  we  show  how  the  constraints  may  be  interpreted 
so  that  inter-variable  dependencies  may  be  ignored  through  a  process  of 
treating  program  variables  as  sets.  Then,  the  set  based  approximation  of 
a  program  is  defined  to  be  the  smallest  such  "set”  interpretation  that  is  a 
model  of  the  constraints.  That  is,  the  least  (standard)  modd  of  the  envi¬ 
ronment  constraints  gives  the  program’s  collecting  semantics,  and  the  least 
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set  model  gives  the  set  based  approximation  of  the  program. 

The  same  basic  plan  of  operational  semantics,  collecting  semantics,  en¬ 
vironment  constrmnts  and  set  based  approximation  is  carried  out  for  both 
imperative  and  logic  programs  (under  a  variety  of  execution  strategies).  The 
operational  semantics,  collecting  semantics  and  environment  constraints  for 
imperative  programs  are  ^ven  in  Chapter  3.  The  corresponding  chapter 
for  logic  programs  is  Chapter  4.  Chapter  5  defines  the  set  based  interpre¬ 
tation  of  the  environment  constr^ts,  and  thus  defines  set  based  program 
approximation. 

Part  II  describes  how  set  based  approximations  may  be  computed.  Chap¬ 
ter  6  introduces  set  constraints,  which  is  the  key  formalism  for  computing 
set  based  approximations.  Most  importantly,  this  chapter  shows  how  envi¬ 
ronment  constraints  may  be  converted  to  set  constraints  such  that  the  least 
set  model  of  the  environment  constraints  corresponds  to  the  least  model 
of  the  set  constraints.  In  other  words,  this  translation  shows  how  the  set 
based  approximation  of  a  program  may  be  represented  as  the  least  model 
of  a  collection  of  set  constraints.  Chapter  7  then  presents  an  algorithm  for 
solving  set  constraints.  This  algorithm  is  presented  in  a  number  of  stages. 
First,  a  generic  set  constraint  a^orithm  is  presented  that  abstracts  the  key 
features  of  the  algorithm.  Then,  an  instance  of  the  algorithm  is  defined 
that  deals  with  intersection  and  projection.  Finally  the  complete  algorithm 
is  presented  for  solving  the  kinds  of  set  constraints  obtained  when  environ¬ 
ment  constraints  are  translated  to  set  constraints.  The  last  chapter  of  part  II 
describes  experience  with  a  prototype  implementation.  A  naive  implemen¬ 
tation  of  the  basic  set  constraint  algorithm  is  completely  unusable  except 
for  very  small  collections  of  constraints.  However  substantial  progress  has 
been  made  by  using  specialized  representation  techniques  and  dealing  with 
the  redundancy  that  is  typically  present.  Although  much  work  remains,  the 
results  obtained  so  far  demonstrate  that  practicality  is  within  reach. 

Part  III  describes  some  extensions  to  set  based  analysis.  While  the  main 
body  of  this  thesis  deals  with  the  problem  of  analyzing  a  program  to  de¬ 
termine  the  possible  values  that  variables  can  be  bound  to  during  program 
execution,  many  of  the  techniques  devdoped  can  be  applied  to  other  anal¬ 
ysis  problems.  In  particular,  many  of  the  algorithms  preserve  numerous 
structural  properties  of  a  program,  and  it  is  in  fact  easy  to  modify  the  al¬ 
gorithms  to  compute  instantiation  information  (for  lo^c  programs)  as  well 
as  information  about  sharing.  This  is  the  subject  of  Chapter  9.  Chapter 
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10  shows  how  the  techniques  of  set  based  analysis  can  be  extended  in  an¬ 
other  direction  -  by  adding  a  limited  ability  to  reason  about  inter-variable 
dependencies.  The  motivation  for  this  work  is  that  some  kinds  of  analy¬ 
sis  require  both  accurate  treatment  of  data-structures  as  well  as  reasoning 
about  inter-variable  dependencies.  The  algorithm  presented  in  this  chap¬ 
ter  combines  the  ability  of  set  constraints  to  reason  about  data-structures 
can  be  combined  with  the  ability  of  abstract  interpretation  to  reason  about 
inter- variable  dependencies  in  a  way  that  is  more  accurate  than  just  running 
both  algorithms.  The  last  chapter  of  part  III  is  Chapter  11  where  we  outline 
the  application  of  set  based  analysis  to  functional  programs,  with  particular 
focus  on  the  language  ML.  This  chapter  is  largely  illustrative,  showing  con¬ 
nections  between  set  based  analysis  and  type  inference  in  subtype  systems, 
as  well  as  relationships  to  control  flow  analysis. 

Much  of  the  work  in  this  thesis  unifles  and  extends  joint  work  in  earlier 
papers  such  as  [21,  22,  23,  24,  25,  26].  For  details  about  these  papers  and 
how  they  related  to  this  thesis,  and  for  detaOs  about  related  work  by  other 
authors,  see  Sections  5.6  and  7.7  (the  former  deals  with  work  in  the  area 
of  program  approximations,  and  the  latter  deals  with  work  in  the  area  of 
decidability  results  and  algorithms  for  set  constraints). 


Part  I 


Set-Based  Approximation 


The  general  scheme  for  obtaining  set  based  approximations  can  be  described 
as  follows.  Starting  with  a  collecting  semantics  for  a  program,  the  first  step 
is  to  characterize  this  semantics  in  terms  of  con^tency  conditions  between 
the  collections  of  environments  identified  by  the  semantics.  This  is  achieved 
by  introducing  a  variable  to  represent  each  environment  collection,  and  then 
constructing  envtroamenf  constraints  on  these  variables  to  capture  consis¬ 
tency  conditions  between  neighboring  program  points,  in  such  a  way  that 
the  least  model  of  these  constraints  is  exactly  the  coUecting  semantics.  The 
next  step  consists  of  reinterpreting  the  environment  constraints  of  the  pro¬ 
gram  so  that  each  environment  variable  becomes  a  mapping  from  program 
variables  into  sets  of  values.  The  set  based  approximation  of  a  program  is 
then  defined  to  be  the  least  model  of  the  environment  constraints  under 
this  new  interpretation. 

This  part  consists  of  four  chapters.  The  first  two  chapters  present  a  va¬ 
riety  of  operational  semantics  and  corresponding  collecting  semantics  and 
environment  constraints  for  the  two  main  language  paradigms  considered 
in  this  thesis  -  imperative  and  logic  programs.  The  purpose  of  these  chap¬ 
ters  is  to  provide  essential  definitions.  Most  of  the  ideas  they  contain  have 
appeared  elsewhere  in  the  literature  in  one  form  or  another.  One  exception 
is  perhaps  the  heavy  emphasis  on  constraints,  in  contrast  with  the  more 
usual  denotational  semantics  approach.  The  third  chapter  in  this  part  con¬ 
tains  the  definition  of  set  based  approximation  and  is  the  core  chapter  of 
this  part.  The  concluding  chapter  contains  a  discussion  of  this  definition. 
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Chapter  3 

Imperative  Programs 


We  consider  a  simple  imperative  language  over  data  structures  with  assign¬ 
ment,  conditional  and  iteration  statements.  An  operational  semantics  for 
this  language  is  presented  using  a  rewrite  relation.  A  collecting  semantics 
is  then  defined  by  specifying  an  appropriate  notion  of  program  point,  and 
then  projecting  the  operational  semantics  onto  these  points.  Although  this 
definition  of  collecting  semantics  is  a  very  natural  one  in  the  sense  that  it 
directly  captures  the  notion  of  what  "happens”  at  each  program  point,  it 
provides  little  insight  into  how  a  program  may  be  analyzed.  This  motivates 
an  alternative  characterization  of  the  collecting  semantics  using  environ¬ 
ment  constraints  that  express  notions  of  local  consistency  between  neigh¬ 
boring  program  points.  The  environment  constraints  are  similar  in  spirit  to 
equational  formulations  of  collecting  semantics  used  widely  in  the  program 
analysis  literature. 
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3.1  Imperative  Programs 

The  underlying  values  of  the  language  are  data  structures.  Specifically,  there 
are  constructors  such  as  nil  and  cons  to  build  up  data  structures,  projections 
such  as  car  and  cdr  to  decompose  data  structures,  and  some  basic  primitives 
for  testing  the  outermost  constructor  of  a  term.  The  language  is  untyped. 
We  now  describe  the  details. 

Let  VAR  and  E  be  disjoint  sets,  respectively  denoting  the  set  of  program 
variables  and  the  set  of  data  constructors.  Each  data  constructor  is  assumed 
to  have  a  unique  arity.  A  data  constructor  with  aiity  0  is  called  a  constant. 
Corresponding  to  each  data  constructor  /  of  arity  n  >  1,  there  are  n  pro¬ 
jection  operations,  denoted  ,  . . .,  /m-  For  example,  car  and  cdr  may  be 
denoted  by  and  cons^^j  respectively.  An  (imperative  program)  term 

is  either  a  variable  from  var  or  of  the  form  /(ti,...,fn)  or  /yj{ti),  where 
n  >  0,  f  is  3La  n-ary  data  constructor  from  E,  each  t,-  is  a  term,  and  in  the 
projection  case,  1  <  j  <  n.  An  atomic  program  condition  is  of  the  form 
s  =  t  OT  match f{t)  where  /  is  a  data  constructor  from  E  and  s  amd  t  are 
program  terms  constructed  from  program  variables  and  projection  symbols. 
A  program  condition  is  any  combination  of  atomic  program  conditions  using 
the  usual  boolean  connectives  A,  V  and 

An  imperative  program  F  is  a  sequence  of  program  statements,  Seq, 
defined  as  follows 

Stat  ::=  X:=t 

I  if  cond  then  Seq 
I  while  cond  do  Seq 

Seq  ::=  Stat 

1  Seqi  ;Seqt 

where  is  an  associative  sequencing  operator.  Figure  3.1  contains  an  ex¬ 
ample  imperative  program  that  computes  the  last  element  of  the  list  a.b.niL 
If  Stat  is  of  the  form  if  cond  then  Seq  or  while  cond  do  Seq,  then  Seq  is 
called  the  body  of  Stat.  The  expressions  first(Seq),  second(Seq)  and  last(Seq) 
respectively  denote  the  first,  second  and  last  statements  in  Seq,  if  they  exist. 
The  expression  tail  (Seq)  denotes  the  sequence  Seq'  whenever  Seq  is  of  the 
form  Stat; Seq'.  In  the  context  of  a  program  P,  each  statement  occurrence 
in  P  is  assumed  to  be  labeled  with  a  unique  integer;  labels  are  denoted  by 


3.2.  OPERATIONAL  SEMANTICS 


37 


1.  L  :=  cons(a,  cons(b,  nil)); 

2.  X  :=  c; 

3.  while  {match to%t{L))  do 

4.  X  :=  car{L); 

5.  L  :=  cdr{L); 


Figure  3.1:  Program  2  (Revisited) 


7  (possibly  subscripted).  Writing  Stat°  denotes  the  unique  statement 
occurrence  in  P  with  label  a.  Writing  °‘Seq^  denotes  that  Seq  is  a  sequence 
of  statements  such  that  first{Seq)  is  labeled  with  a  and  last{Seq)  is  labeled 
with  0.  If  Stat°‘  and  Stat^  appear  as  statements  somewhere  in  P  and  Stat^ 
appears  immediately  after  Stat^,  then  Stat“  and  Stat^  are  consecutive  state¬ 
ments  m  P.  For  example  the  program  in  Figure  3.1,  statements  1  and  2  are 
consecutive  statements  and  so  are  statements  4  and  5. 


3.2  Operational  Semantics 


We  now  present  an  operational  semantics  for  imperative  programs.  A  value 
is  a  program  term  that  contains  only  symbols  from  S.  For  example,  nil, 
cons{a,nil)  and  cons{a,b)  are  values,  but  cdr{cons{b,nil))  and  cons{X,Y) 
are  not.  An  environment  p  is  a  mapping  from  VAR  into  values.  We  shall 
write  . . . ,  to  denote  an  environment  that  maps  Xi  into  v,, 

i  =  l..n.  The  expression  p[X^v\  denotes  the  environment  that  maps  X 
into  V  and  maps  all  other  variables  Y  into  p{Y).  An  environment  can  be 
extended  to  become  a  partial  function  from  program  terms  t  to  values  as 
follows: 


•  p(/(^l>  •  •  •  >^n))  —  /(p(^l)>- •  •  »p(^n))* 

•  if  p(t')  = for  some  values 


For  some  program  terms  t,  such  as  cons{car{nil),Y),  p{t)  is  not  defined. 
The  notation  p  >t  shall  be  used  to  indicate  that  p{t)  is  defined. 
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Note  that  environments  are  defined  to  be  total  functions  from  (the  pos¬ 
sibly  infinite  set)  VAR  into  values.  The  reason  for  adopting  this  somewhat 
non-standard  definition  is  twofold.  First,  it  leads  to  greater  uniformity  in 
later  definitions,  such  as  those  dealing  with  the  semantics  of  logic  programs 
(Chapter  4).  Second,  this  definition  means  that  aU  environments  have  the 
same  fixed  domain,  and  this  results  in  some  significant  simplifications  in 
later  definitions,  particularly  those  involving  environment  constraints  and 
the  translation  of  environment  constraints  to  set  constraints. 

We  now  specify  the  meaning  of  program  conditions.  First  define  that  a 
program  condition  cond  is  defined  under  p,  denoted  p  >  cond,  if  p  t>  t  for 
each  program  term  t  appearing  in  cond.  Now,  for  environments  p  such  that 
p  t>  cond,  define  the  relation  p  ^  cond  as  follows. 

9  p^  s  =  t  iff  p(s)  =  p{t). 

•  p^  match  f(t)  iff  p(t)  is  of  the  form /(t>i,.. .  ,»„)• 

•  p  ^  condi  A  cond2  iff  p  [=  condi  and  p  1=  cond^. 

•  p  condi  V  cond2  iff  either  p  f=  cond\  or  p  ^  conda. 

•  p  1=  ->cond  iff  it  is  not  the  case  that  p  ^  cond. 

Since  p  ^  cond  is  only  defined  if  p  C>  cond,  both  p  |=  cor(n»/)  and  p  ^ 
->car{nil)  are  undefined.  In  what  follows,  we  shall  only  write  the  expression 
p  1=  cond  when  it  is  clear  from  context  that  p  >  cond. 

A  state  is  a  pair  of  the  form  (p  ;  Seq)  where  p  is  an  environment  and  Seq 
is  either  a  sequence  of  statements  or  the  special  symbol  empty  denoting  the 
empty  sequence  of  statements.  The  sequencing  operator  is  extended  in 
the  obvious  way  to  deal  with  empty.  Seq; empty  =  Seq  =  empty ^Seq.  The 
meaning  of  an  imperative  program  is  defined  via  a  rewrite  relation  on  states. 
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{p-.x  :=t;Seq)  (p[A’K*p(t)] :  5eg) 

if  p  t>  t. 

(p :  (if  cond  then  Seq');Seq)  -*  {p :  Seq';Seq) 

if  p  l>  cond 
and  p  1=  cond. 

(p :  (if  cond  then  Seq');Seq)  (p :  Seq) 

if  p  >  cond 
and  p  ^  ->cond. 

(p :  (while  cond  do  Seq'); Seq) 

p  t>  cond 

— ►  (p :  5eg';( while  cond  do  Seq'); Seq) 

and  p  ^  cond. 

(p :  (while  cond  do  Seq');Seq)  -*■  (p :  Seq) 

if  p  >  cond 
and  p  1=  -tcond. 

A  derivation  is  a  sequence  of  rewrite  steps  of  the  form 

(po  :  Seq^)  —  :  Seq^)  - ►  <p„_i :  5cg„_i)  {p„  :  Seq^). 

We  frequently  write  (po  :  Seqo)  -»*  (p„  :  5cg„>  or  (po  :  5eqo)  (Pn  •' 
to  denote  the  existence  of  such  a  derivation  from  (po  :  Seq^)  to  (pn  :  Seq^). 
We  shall  consistently  use  po  to  denote  the  starting  environment  of  a  deriva¬ 
tion. 

The  program  terminates  on  environment  po  if  there  is  a  maximal  deriva¬ 
tion  of  the  form  (po  :  P)  —**  (p :  Seq).  Since  there  is  at  most  one  rewrite 
step  applicable  to  a  state,  there  is  at  most  one  such  derivation.  If  the  final 
state  is  of  the  form  {p :  empty)  then  the  program  is  said  to  terminate  with 
environment  p.  If  the  final  state  is  not  of  this  form  then  the  program  is  said 
to  terminate  with  an  error.  This  is  the  case,  for  example,  if  the  program 
attempts  to  evaluate  a  term  such  as  car{nil).  There  is  no  special  “error” 
value,  but  rather  an  error  corresponds  to  a  state  from  which  no  transition 
is  possible. 

Although  statement  labels  have  been  ignored  in  the  above  definition  of 
rewrite  steps  and  derivations,  it  is  straightforward  to  extend  the  definitions 
so  that  -♦  relates  states  of  the  form  (p :  Seq)  where  Seq  is  a  sequence  of 
labeled  statements.  For  example, 

{{X  :=  l)“;(y  :=  6)^  :  p>  {{Y  :=  b)^  :  p[X^a]) 

—*  {empty :  p[Xt-^a][y •-+6]) 
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is  a  valid  derivation.  In  what  follows,  we  shall  implicitly  assume  that  deriva¬ 
tion  involve  labeled  statements. 


3.3  Collecting  Semantics 


The  operational  semantics  described  in  the  previous  section  defines  how 
a  program  is  executed.  Importantly,  given  a  program  P  and  a  starting 
environment  po,  it  defines  the  environment  resulting  from  the  computation 
of  P  starting  with  po.  That  is,  the  operational  semantics  can  be  thought 
of  as  a  mapping  from  a  program  P  into  its  meaning  [P]  where  [P](po)  is 
the  environment  p  such  that  {P :  po)  {empty :  p).  Note  that  [P]  is,  in 
general,  a  partial  function. 

However,  such  a  view  of  the  operational  semantics  does  not  say  anything 
about  what  happens  during  the  execution  of  the  program.  For  example, 
it  does  not  describe  the  values  that  a  program  variable  may  take  during 
execution.  Such  information  is  clearly  central  to  program  analysis.  What  is 
required  therefore,  is  a  view  of  the  operational  semantics  that  makes  explicit 
what  happens  during  program  execution.  That  is,  we  need  to  collect  the 
environments  encountered  at  each  point  in  the  program  during  the  program 
execution,  and  this  is  called  a  collecting  semantics. 

The  notion  of  collecting  semantics  is  the  starting  point  of  all  formal  treat¬ 
ments  of  program  analysis.  Our  collecting  semantics  is  just  an  explication 
of  information  already  implicit  in  the  operational  semantics.  In  essence,  the 
operational  semantics  is  projected  onto  the  notion  of  program  point.  As  an 
aside,  we  note  that  since  a  collecting  semaitics  describes  what  happens  part 
way  through  a  computation  (including  computations  that  lead  to  an  error  or 
do  not  terminate),  a  natural  semantics  style  presentation  of  the  operational 
semantics  [37,  54]  would  be  significantly  less  convenient  than  the  transition 
style  system  we  have  employed. 

We  first  establish  a  notation  for  referring  to  points  in  a  program.  As 
mentioned  earlier,  we  assume  that  each  statement  occurrence  in  a  program  is 
uniquely  labeled.  Now,  with  each  statement  occurrence  Stat^,  we  associate 
two  program  points  fa  and  la  to  respectively  indicate  the  execution  states 
just  before  and  just  after  5tat°'  is  executed.  A  collecting  interpretation  for  an 
imperative  program  is  an  association  of  a  collection  of  environments  to  each 
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program  point.  The  collecting  semantics  of  an  imperative  program  is  the 
specific  collecting  interpretation  that  associates  with  each  program  point 
the  collection  of  environments  encountered  at  that  point  during  program 
execution. 

To  see  how  the  collecting  semantics  may  be  formalized,  first  observe  that 
the  execution  of  a  program  P  starting  with  environment  po  is  completely 
characterized  by  the  maximal  derivation  of  the  form 

{po  :  P)  (pi :  Seq^)  -*  {p2  :  5cg2>  •  (3.1) 

Now,  consider  collecting  the  environments  for  a  program  point  of  the  form 
tci!.  That  is,  we  wish  to  collect  all  of  the  environments  encountered  in  the 
derivation  (3.1)  just  before  statement  a  is  executed.  This  can  be  simply 
stated  as: 

{p  :  {po  :  P)  -»*  {p :  Stat°‘  ;5cg)} . 

The  collection  of  environments  for  a  program  point  ja  is  more  involved 
because  first  the  notion  of  “just  after  statement  execution”  must  be  for¬ 
malized.  To  this  end,  define  a  reflexive  transitive  ordering  on  sequences  of 
statements:  Seq-^  >  Seq^  if 

Seqi  is  of  the  form  Seq]Seq2  for  some  Seq. 

In  other  words  Seqi  >  Seq2  if  Seq2  is  equal  to  Seqi  or  else  Seq2  is  a  final 
subsequence  of  Seqi.  For  example  Stati]Stat2;Stat3  >  Stat2;Stat3,  but 
it  is  not  the  case  that  Stati;Stat2;Stat3  >  Stati;Stat2.  Also  define  that 
Seqi  >  Seq2  if  Seqi  >  Seq2  and  Seqi  ^  Seq2.  Now,  suppose  that  derivation 
(3.1)  has  the  form: 

{po  :  P)  {Pn  :  Stat°;Seq)  ->•  (p„+i :  Seq„+i)  (p„+,- :  • 

The  transition  from  state  {pn  :  Stat°‘\Seq)  starts  an  execution  of  statement 
Stat^.  Now,  the  execution  of  this  statement  could  be  completed  in  one  step 
(this  is  the  case,  for  example,  if  Stat°‘  is  an  assignment),  and  then  the  state 
(pn+i :  1^  1°  f^^  (Pn+i  ’  Seq).  On  the  other  hand,  Stat°‘  may  take 

more  than  one  step  to  execute  (this  is  the  case,  for  example,  if  Stat^  is  a 
while-do  statement  whose  condition  is  satisfied  by  pn)>  ^md  then  the  state 
(pn+i  ’Seqn+i)  will  be  such  that  5eg„^i  >  Seq.  In  fact  the  execution  of 
Stat^  continues  while  the  subsequent  states  (pn+t  -  Seq„^.i)  are  such  that 
5eg„^,-  >  Seq.  Two  possibilities  arise:  either  (a)  Seq^+i  >  Seq  for  all  », 
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and  Stat°‘  never  completes  execution,  or  (b)  =  Seq  for  some  j  >  1, 

and  this  means  that  Stat^  completes  execution  when  state  {pn+j :  Seq)  is 
reached.  In  case  (b),  pn+j  is  an  environment  encountered  just  after  Stat^ 
has  completed  execution.  So,  corresponding  to  each  point  ja,  we  wish  to 
collect  environments  pn-^-j  such  that 

(po  :  P)  -►’*  {pn  :  Stat“ ; Seq)  -»■'  (pn+i  :  (3.2) 

where  Seq^^^  is  Seq  and  Seq^^^  >  Seq  for  1  <  *  <  j.  For  notational 
convenience,  we  write  (pi  'Seq^)  -*'seq  (P™  -Seq^)  if  there  exists  a  deriva¬ 
tion  {pi'.Seqi)  (pfniSeq^)  where  Seq^  >  Seq,  1  <  i  <  m. 

(We  shall  also  sometimes  omit  the  length  of  the  derivation  and  just  write 
(pi :  Seqi)  —^seq  (Pm  '  Seq^^)')  Using  this  notation,  the  set  of  environments 
dehned  by  (3.2)  can  be  more  concisely  described  as  the  set  of  p  such  that 

(po  :  P)  (p' :  Stat^\Seq)  (p :  Seq) 

To  give  some  intuition  about  this  definition,  note  that  the  property 
(p' :  Stat“',Seq)  -*^stq  {P  •  iff  there  exist  environments  p2, . . .  ,Pj-i 

and  statement  sequences  Seq2,...,Seqj^i  such  that 

(p' :  Stat^^Seq)  -*  (pj :  Seq2  ;Seq)  -* - ^  (p,_i :  5egj_i  ;5cq)  (p :  Seq). 

Moreover,  from  the  definition  of  this  is  a  derivation  iff  (p' :  Stat^)  -* 

{p2  :  Seq2)  - - ►  (^Pj-i  •  (P  •  empty).  Hence  (3.2)  is  equivalent 

to  the  existence  of  two  derivations 

(po  :  P)  (Pn  :  Stat°‘;Seq)  and  (p„  :  5tof®)  {pn+j  :  empty) 

where  the  first  derivation  corresponds  to  execution  reaching  statement  Stat° 
and  the  second  corresponds  to  the  execution  of  Stat°. 

Before  presenting  the  complete  collecting  semantics,  we  address  the  issue 
of  starting  environments.  In  the  operational  semantics,  it  was  appropriate 
to  define  program  execution  from  some  given  starting  environment  po-  This 
was  carried  over  in  the  above  discussion  of  collecting  semantics.  However, 
when  performing  analysis  of  a  program,  the  initial  environment  may  not  be 
known.  This  issue  may  be  addressed  in  a  number  of  ways.  First,  program 
execution  could  be  defined  to  start  in  a  fixed  initial  environment  (which, 
say,  maps  all  variables  to  nil).  Second,  programs  could  be  defined  to  be¬ 
gin  with  a  sequence  of  assignment  statements  that  initialize  all  program 
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points 

environments 

Ti 

{all  environments} 

il,T2 

{[A’t-»a.6.nt7,yt-»t>] :  for  any  value  v} 

12 

{[A’H^a.6.nt7,yK-»c]} 

T3 

{[A'i-»a.6.nt7,  Y  t-»c],  [Jr»-+6.m7,yi-»a],  [X>-*nil,  yi-^ft]} 

T4 

{[X^-*a.b.nil,Y*-*c],  [A’t->6,nt7,  Y  ►-»a]} 

i4,T5 

{[X>^a.b.nil,Y^a],[X>-^b.nil,Y>^b]} 

i5 

{[X>^b.nil,Y^a],  [X>-*nil,Y^b]} 

i3 

{[X>-*nil,Y>-^b]} 

Figure  3.2:  Collecting  Semantics  for  Program  2 


variables,  and  then  the  initial  environment  is  essentially  irrelevant.  Third, 
the  collecting  semantics  could  be  defined  with  respect  to  a  set  S  of  start¬ 
ing  environments  (although  note  that  this  introduces  the  issue  of  how  S 
is  represented).  Fourth,  collecting  semantics  could  be  defined  to  be  the 
environments  encountered  over  executions  from  all  possible  starting  envi¬ 
ronments.  Of  these  four  possibilities,  the  last  two  are  the  most  reasonable. 
For  simplicity,  we  choose  the  last  one,  although  note  that  the  definitions 
and  algorithms  presented  in  this  thesis  can  easUy  be  adapted  to  deal  with 
the  third  possibility  if  S  is  represented  using  regular  tree  grammars  (see 
Section  7.1).  The  collecting  semantics  of  an  imperative  program  P  can  now 
be  presented. 


Definition  1  The  collecting  semantics  CSp  of  an  imperative  program  P  is 
the  mapping: 

fa  :3po  s.t.  {pq:  P) -**  {p:  Stat°;Seq)^  , 

ia  •-»  {p:  3po  s.t.  {po  :  P)  {p' :  Stat°]Seq}  -fj,,  {p :  5eg)}  [] 

Figure  3.2  presents  the  collecting  semantics  for  Program  2  (see  Figure  3.1). 

A  definition  of  collecting  semantics  is  the  starting  point  of  program  anal¬ 
ysis.  In  particular  it  provides  the  primary  definition  of  correctness:  an  (ap¬ 
proximate)  analysis  is  correct  if  it  yields  a  conservative  approximation  of 
the  collecting  semantics.  In  other  words,  an  analysis  is  correct  if,  for  each 
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program  point,  the  set  of  environments  described  by  the  analysis  for  that 
point  is  a  superset  of  the  set  of  environments  described  by  the  collecting 
semantics.  However,  this  definition  of  correctness  provides  little  insight  into 
how  program  analysis  might  be  performed.  We  therefore  present  an  alter¬ 
native  definition  of  collecting  semantics  that  provides  a  more  concrete  basis 
for  computing  information  about  the  collecting  semantics. 


3.4  Environment  Constraints 


In  essence,  environment  constraints  characterize  the  collecting  semantics  of 
a  program  by  capturing  a  notion  of  "local  consistency”  between  the  col¬ 
lections  of  environments  associated  with  neighboring  program  points.  We 
begin  by  defining  the  general  form  and  interpretation  of  the  constraints  used. 
The  following  definitions  are  made  in  the  context  of  a  program  P.  An  en¬ 
vironment  variable  is  a  variable  that  ranges  over  sets  of  environments,  and 
shall  be  denoted  by  the  symbol  For  each  program  point  /i,  there  is  a  dis¬ 
tinguished  environment  variable  denoted  'J"*,  whose  purpose  is  to  describe 
the  environments  corresponding  to  point  ft.  An  environment  expression  is 
either  an  environment  variable  or  an  expression  of  the  form  T,  t] 

or  ^[cond\,  where  ^  is  an  environment  variable,  AT  is  a  program  variable,  t 
is  a  program  term  and  oond  is  a  program  condition.  Informally,  T  denotes 
all  environments,  »-»  t]  is  used  to  model  assignment  statements  and 
^[cond\  is  used  to  model  if-then  and  while-do  statements.  An  environ¬ 
ment  constraint  is  of  the  form  ^  D  ee  where  $  is  an  environment  variable 
and  ee  is  an  environment  expression. 

The  meaning  of  environment  expressions  and  constraints  is  defined  in  the 
context  of  an  interpretation  I  that  maps  each  environment  variable  into  a  set 
of  environments.  Such  a  mapping  X  is  extended  to  map  from  environment 
expressions  into  sets  of  environments  as  follows: 

•  J(T)  =  {aU  environments}. 

•  X{9[X  *-*  t])  =  {p\Xt-^p{t)\ :  p  €  0}  where  0  is  {p  €  T{9)  :  p  >t}. 

•  I(’®'[cond])  =  {p  €  0  :  p  ^  oond)  where  0  is  {p  €  '■  P  >  cond}. 

An  interpretation  J  is  a  model  of  a  collection  of  environment  constraints  if, 
for  each  constraint  ^  D  eeva.  the  collection,  X{^)  3  J(ee).  Interpretations  of 
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environment  constraints  are  ordered  componentwise:  J  C  J'  if  C  J*(9) 
for  all  environment  variables 

Now,  corresponding  to  an  imperative  program  P,  we  construct  envi¬ 
ronment  constraints  to  capture  the  local  consistency  conditions  between 
neighboring  points  in  the  program. 


Definition  2  The  environment  constraints  £C p  corresponding  to  an  imper¬ 
ative  program  P  consist  of  the  following  collections  of  constraints: 

a)  D  T; 


b)  4'^^  3  if  Stat^  and  Stat^  are  consecutive  statements  in  P; 

c)  4'^“  D  4'T“[Jri-^t]  for  each  Stat**  in  P  of  the  form  X  :=t; 


4fT^  3  4'^“[cond] 

d)  4'i®  3  4’^“[-icond]  > 

4rl“  3 


for  each  Stat°  in  P  of  the  form 
if  cond  then  ^Seq"^; 


4rTa  3  xpril' 

e)  4^1^  3  4'^“  [cond] 
ijtla  2  4'^®[-<cond] 


for  each  Stat°‘  in  P  of  the  form 
while  cond  do  ^Seq^; 


Before  giving  a  formal  statement  of  the  correctness  of  these  constraints,  we 
shah  first  provide  some  motivation  for  their  construction.  The  constraint 
in  (a)  corresponds  to  the  adopted  convention  that  programs  start  in  an 
arbitrary  environment.  The  constraint  in  (b)  expresses  a  simple  containment 
relationship  for  consecutive  statements.  The  constraints  in  (c),  (d)  and 
(e)  correspond  to  assignment,  conditional  and  iterative  statements  in  the 
program. 

For  example,  consider  again  Program  2  from  Figure  3.1.  The  constraint 
corresponding  to  the  statement  X  :=  car{L)  is  3  ♦l^[Jrt->cons^jj(X)], 
denoting  that  4'^^  contains  the  environments  from  4*^^  after  they  are  mod¬ 
ified  to  map  X  into  the  value  of  car(A').  The  complete  collection  of  envi¬ 
ronment  constraints  for  this  program  appears  in  figure  3.3. 

Environment  constraints  are  stated  as  set  containment  relationships  in¬ 
stead  of  set  equalities  because  containment  leads  to  a  much  more  flexible 
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5.  L  :=  cdr(L); 

^14 
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♦T^[Xi-^consril(I)] 
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^i4 

D 

^13 

D 

Figure  3.3:  Program  2  and  Its  Environment  Constraints 


definition.  In  particular,  the  use  of  containment  leads  to  a  very  weak  notion 
of  local  consistency,  and  so  it  admits  the  possibility  of  models  of  the  con¬ 
straints  in  which  the  environments  associated  to  a  program  points  may  be 
larger  than  necessary.  In  doing  so,  it  allows  approximations  of  the  collecting 
semantics  to  be  models  of  the  environment  constraints.  The  use  of  equality 
would  essentially  exclude  this  possibility. 

Importantly,  the  environment  constraints  €Cp  oi  a  program  P  possess  a 
least  model,  denoted  lm{€Cp).  This  follows  from  corollary  4  in  Appendix 
I,  noting  that  the  operators  of  the  environment  constraint  calculus  -  that 
is  the  constant  T  and  the  postfix  operators  and  [cond]  -  are  all 

monotonic  operators  over  sets  of  environments.  This  least  model  provides 
an  alternative  definition  of  collecting  semantics. 

Theorem  1  (Environment  Constraint  Correctness) 

The  collecting  semantics  of  an  imperative  program  P  maps  any  program 
point  fi  into  /m(£’Cp)(f"‘).  [] 

The  proof  of  this  theorem  is  developed  in  the  next  section.  It  is  rather 
lengthy  and  tedious  and  is  included  mainly  for  the  sake  of  completeness. 
Note  that  many  accounts  of  program  analysis  in  the  literature  simply  start 
with  an  equational  version  of  the  collecting  semantics,  and  so  the  step  of 
proving  the  equivalence  of  the  collecting  semantics  induced  by  an  underlying 
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operational  semantics  and  the  equational  version  of  the  collecting  semantics 
is  effectively  bypassed.  However,  we  believe  that  an  operational  semantics 
version  of  the  collecting  semantics  is  a  more  appropriate  starting  point  for 
program  analysis,  and  so  the  issne  of  proving  the  equivalence  of  the  two 
formulations  of  collecting  semantics  must  be  addressed. 


3.5  Environment  Constraint  Correctness 

We  begin  with  some  initial  properties  of  the  operational  semantics  of  im¬ 
perative  programs.  The  following  two  propositions  are  simple  observations 
about  the  definition  of  — The  first  describes  the  ways  in  which  a  derivation 
step  can  change  the  sequence  of  statements  in  a  state.  The  second  shows 
that  a  statement  must  appear  at  the  front  of  the  statement  sequence  before 
it  can  be  removed. 


Proposition  1  If  {pa  :  Seqg)  -*  (pi, :  5eqt)  then  either 

(a)  Seq^  =  Stat;Seqf,  for  some  statement  Stat, 

(b)  Seqf,  =  Seq;taU(Seq^)  and  first{Seq^)  is  a  statement  of  the  form 
if  cond  then  Seq  such  that  p^  >  cond  and  pa  ^  cond,  or 

(c)  Seqf,  =  Seq’ySeq,^  and  fir8t{Seq,^)  is  a  statement  of  the  form 
while  cond  do  5eg  such  that  p,  >  cond  and  pa  |=  cond.  [] 


Proposition  2  If  {poiSeq^)  -*  {piiSeq-i)  {pn’.Seqj,)  such  that 

SeqQ  >  Seq  and  Seq„  Seq  then,  for  some  k  <  n,  Seq^  =  Seq. 


Proof;  Proposition  1  implies  that  if  (p*  :  5eq„)  -» (pj :  Seqf,)  and  Seq,^  >  Seq 
then  Seqf,  >  Seq.  Consider  applying  this  fact  to  the  first  step  {po :  Seqo)  —* 
(pi :  Seqi)  in  the  above  derivation.  Since  Seqo  >  Seq,  the  fact  implies  that 
Seqi  >  Seq.  Hence,  either  Seqi  >  Seq  or  5egj  =  Seq.  In  the  latter  case  the 
proposition  is  proved.  In  the  former  case,  the  fact  can  be  applied  again,  this 
time  to  the  step  (pi :  Seqi)  {P2  '•  Seq^.  Repeating  this  argument  proves 
that  either  there  is  a  A;  such  that  Seqf^  —  Seq,  or  else  5eg,^  >  Seq.  Since  it 
is  assumed  that  Seq„  ^  Seq,  the  proposition  is  proved.  [] 
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The  next  two  propositions  deal  with  statements  Stat  that  are  either 
if-then  or  while-do  statements.  They  prove  that  the  last  statement  in 
the  body  of  such  a  statement  Stat  can  only  be  introduced  by  an  execution 
of  Stat.  These  propositions  provide  an  important  connection  between  the 
environments  encountered  after  execution  of  the  last  statement  in  the  body 
of  Stat  and  the  environments  encotmtered  after  the  execution  of  Stat  itself. 


Proposition  3  Let  Stat**  he  if  cond  then  Seq  and  let  la8t{Seq)  be  Stat^. 
If  {po  :  P)  —**  (j> :  Stat^;Seq*'j  then  there  exists  an  environment  p*  such  that 
p*  >  cond,  p'  f=  cond  and 

{Pq  :  P)  -»*  {p‘ :  Stat**  •, Seq')  (jf :  Stat^;Seqf^. 

Proof;  If  {pQ  ;  P)  -**  :  Stat^;Seq''^  then  there  exists  a  derivation  of  the 

form 

{pQiSeqo)  -►  {pi'.Seq^)  ^  <p„_i ;  5e?„_i)  -»  {pniSeqJ 

where  {po  :  Seqo)  is  {po  :  P)  and  {pn  :  Seq,,)  is  :  Stat^ ’,Seq'^.  Now,  pick  the 
largest  i  such  that  Seq^  ^  Seq,,.  Such  an  i  exists  because  Seq^  (which  is  just 
P)  cannot  be  of  the  form  Seq„  ^Stat^  ;Seq^,  and  so  Seqo  ^  Seq„.  Also  t  is  less 
than  n  because  Seq„  >  Seq„.  By  construction,  5eg,-  ^  Seq„  and  > 

Seq^.  Hence  the  i*^  step  in  the  derivation  must  introduce  the  statement 
Stat^.  FVom  Proposition  1,  the  only  statement  that  could  introduce  Stat^ 
is  Stat**.  Hence  first{Seqi)  must  be  Stat**,  pi  =  p,+i,  pi  t>  cond,  pi  ^  cond 
and  =  Seq;rest{Seqi).  Moreover,  since  Seqj  >  Seq^,  i  <  j  <  n,  and 

Seq„  is  Stat^iSeq',  it  must  be  the  case  that  Seqj  >  Seq*,  i  <  j  <  n.  In 
summary: 

{po  ’.  Seqo)  -**  {Pi  ■  Stat**] Seq')  -»  {pi :  Seq; Seq')  {j> :  Stat^  ;Seq'Y 

D 

Proposition  4  Let  Stat**  be  while  cond  do  Seq  and  let  last{Seq)  be  Stat^. 
If  {po  :  P)  — »*  (^p :  Stat^;Seq''^  then  first{Seq')  is  Stat**  and  there  exists  an 
environment  p'  such  that  p'  t>  cond,  p'  ^  cond  and 

{po  :  P)  {p' :  Seq')  -**  {p :  Stat'^;Seq'Y 
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Proof:  The  proof  similar  to  Proposition  3.  Again,  the  assumptions  of  the 
proposition  imply  the  existence  of  a  derivation  of  the  form 

{po'Seqo)  -*  {pi-.Seq^)  - - -  <p„_i :  5c9„_i >  (Pn:5eg„) 

where  {po’.SeqQ)  is  {po'.P)  and  {p„:5eg„)  is  (j>:Stat^-,Seq'^.  Pick  the 
largest  i  such  that  Seq^  Seq^.  By  construction,  Seqi  ^  and  Seqi^^  > 
Seq,^.  Hence  the  step  in  the  derivation  must  introduce  the  statement 
Stat^.  From  proposition  1,  it  follows  that  first(Seqi)  is  5tat“,  p,-  =  p,+i. 
Pi  >  cond,  pi  ^  cond  and  Seqi^i  =  Seq;Seqi.  In  summary: 

(po  :  Seqo)  -»*  {pi :  Seqi)  -*  {pi :  Seq;Seqi)  -+*  ^p :  Stat^;Seq'y 

Now,  since  Seq^  ^  (Stat^ ;Seq')  and  {Seq;Seqi)  >  {Stat^ ;Seq'),  it  follows 
that  Seq  must  be  of  the  form  Seq^iSeq^  such  that  Seq^iSeq,  =  Stat^;Seq' 
where  Seqi,  empty.  Since  Stat^  is  the  last  element  in  Seq  (and  occurs 

no  where  else  in  56?),  it  follows  that  Seq^,  is  Stat^.  Hence  Seqi  =  Seq',  and 
the  proposition  is  proved.  [] 

Proposition  5  Let  Stat^  and  Stat^  appear  as  consecutive  statements  in  P. 
If  (po :  P)  —**  {p :  5e9a  iStat'*  ;Seqf,)  then  first{Seqi,)  =  Stat^. 

Proof:  From  the  definition  of  consecutive  statement,  it  must  be  the  case 
that  either  P  or  the  body  of  some  statement  in  P  is  of  the  form 

Seq^ ;  Stat^ ;  Stat^ ;  Seq2 . 

Since  each  statement  in  P  has  a  unique  label,  it  follows  that  if  P  or  the  body 
of  some  statement  in  P  is  of  the  form  Seq,^-,Stat° \Seqi,,  then  first{Seqi,)  must 
be  Stat^.  Using  this  observation,  the  proposition  can  be  established  by  a 
simple  induction  argument  on  the  length  of  derivations.  In  the  base  case  of 
a  length  0  derivation,  the  sequence  Seq,^;Stat°‘ ;Seqf,  is  just  P,  and  so  it  is 
immediate  that  first{Seqf,)  is  Stat^. 

Now  suppose  that  the  proposition  holds  for  (po  :  P)  (pn  :  Seq^J),  and 
suppose  that 

(po :  P)  (p„  :  5eg„)  -»  (pn+i :  5cq„+i). 

Consider  the  three  cases  of  (p„  :  5cq„)  (pn+i :  Seq^^^)  outlined  in  Propo¬ 
sition  1.  In  case  (a),  =  (5tat;5cg„^i),  and  so  if  Seq^+i  is  of  the  form 
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Seq^’,Stat°‘\Seqi,  then  Seq^  is  5tai;5eg„;5tct“;5e5fc,  and  so  first{Seqi,)  = 
Stat^  follows  directly  from  the  induction  hypothesis.  In  cases  (b)  and  (c), 
•S'«9n+i  Seq’,Seq'n^i  where  Seq  is  the  body  of  an  if-then  or  while-do 
statement  and  Seq^^i  is  either  rest(5cg„)  or  5eg„.  Now,  if  Seq^+i  is  of 
the  form  Seq^^Stat^ ^Seqi,  there  are  three  possibilities:  either  (i)  Seq  = 
Seq^',Stat°]first(Seqf,)\Seq*  for  some  Seq\  (ii)  Seq  =  Seq^;Stat“  or  (iii) 
Seq'^^i  =  5cg';5tat“;5egj  for  some  Se^.  In  case  (i),  first(Seqi)  is  Stat^ 
because  Seq  is  the  body  of  some  statement  in  P.  In  case  (ii),  Stat^  is  the 
last  statement  in  Seq,  which  implies  that  Stat^  and  Stat^  cannot  be  con¬ 
secutive  statements  in  P,  and  so  this  case  is  not  possible.  In  case  (iii), 
Seq' ',Stat°  ',Seqi,  is  a  subsequence  of  5cg„,  and  so  the  fact  that  first^Seq^,)  is 
Stat^  follows  from  the  induction  hypothesis.  [] 

Now,  the  collecting  semantics  of  a  program  P  can  be  thought  of  as  an  in¬ 
terpretation  of  the  environment  constraints  €Cp.  Specifically,  let  lea  denote 
the  interpretation  that  maps  into  the  set  of  environments  associated  with 
fjL  in  the  collecting  semantics.  Importantly,  lea  is  not  only  an  interpretation 
of  £Cp,  but  it  is  also  a  model  of  £Cp. 

Lemma  1  Jc»  is  a  model  of  £Cp. 


Proof:  Consider  each  possible  form  of  constraint  in  ECp  in  turn.  First, 
consider  a  constraint  of  the  form  D  T.  Such  a  constraint  is  trivially 
satisfied  since  Jcs(’^^^)  contains  all  environments,  since  the  collecting  se¬ 
mantics  is  defined  to  be  the  collections  of  environments  encountered  when 
the  program  is  started  in  an  arbitrary  environment. 

Second,  consider  a  constraint  of  the  form  9^^  D  Such  a  constraint 
is  present  in  the  environment  constraints  of  P  if  Stat**  and  Stat^  are  con¬ 
secutive  statements  of  P.  Suppose  that  p  €  Ic»(^^‘')-  Then  by  definition, 
there  exists  an  environment  po  such  that  (po :  P)  -»*  (p' :  Stat°‘;Seq) 

(p :  Seq) .  By  Proposition  5,  the  first  statement  of  Seq  must  be  Stat^,  and 
hence  p  €  Icai^^^)- 

Third,  consider  a  constraint  of  the  form  2  •-»<],  correspond¬ 

ing  to  a  statement  Stat"  of  the  form  X  :=t.  Now,  suppose  that  p  € 
Then,  there  is  an  environment  p'  such  that  p'  €  Ic*(^^®)» 
p'  >  t  and  p  is  p'{X^v]  where  v  is  Hence  (po :  P)  — »*  (p' :  Stat°  ',Seq) 
and  {p' :  Stat^  •,Seq)  -*  {p:Seq).  Combining  these  two  facts  proves  that 
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(po :  P)  {p' :  Stat°‘;Seq)  -*■  {p :  Seq)  and  so  p  £ 

Fourth,  consider  a  constraint  of  the  form  2  ’^'^“[cond],  correspond¬ 
ing  to  a  statement  Staf^  of  the  form  if  cond  then  ^Seq^.  Now,  suppose  that 
P  €  Ic5(^^“[cond]).  Then  p  €  P  >  cond  and  p  ^  cond.  Hence 

(po  '  P)  — {p  •  Seq\Seq')  and  so  p  €  since  the  first  statement  in 

Seq  is  Stat^. 

Fifth,  consider  a  constraint  of  the  form  ’8'^“  2  ’(^^"[-•cond],  correspond¬ 
ing  to  a  statement  Stat^  of  the  form  if  cond  then  ^Seq'^.  Suppose  that 
p  €  Icsi.^^°[~'Cond\).  This  implies  that  p  €  P  >  ->cond  and 

p  f=  ->cond,  and  hence  {po'.P)  -»*  {p :  Stat^  ;Seq')  -*■  {p:Seq*)  and  it  im¬ 
mediately  follows  that  p  € 

Sixth,  consider  a  constraint  of  the  form  2  corresponding  to  a 
statement  5<at“  of  the  form  if  cond  then  ^Seq'^.  Suppose  that  p  £  2ea{®^^)' 
Hence  (po  :  P)  -**  (p' :  Stop \Seq')  (p :  Seq').  Applying  Proposition  3 
to  {po  :  P)  —**  {p' :  StaP;Seq')  proves  that  there  exists  an  environment  p" 
such  that 

(po  P)  -»*  (p"  .*  Stop  ;Seq'}  -^g^>  {p' :  StaP;Seq'). 

It  follows  that  {po  :  P)  -♦*  {p" :  Stop; Seq'}  -**g^>  (p  •  Seq'),  and  this  implies 
that  p  € 

Seventh,  consider  a  constraint  of  the  form  2  corresponding 
to  a  statement  StaP  of  the  form  while  cond  do  ^Seq"'.  Suppose  that 
P  €  Then  {poiP)  {// :StaP;Seq')  -**g^,  {piSeq').  Apply¬ 

ing  Proposition  4  to  (po  :  P)  -**  {p' :  StaP’,Seq')  proves  that  there  exists  sm 
environment  p"  such  that 

(po :  P)  -**  {p" :  Seq')  -►*  {p' :  StaP;Seq'), 

and  that  the  first  statement  of  5eg'  is  StaP.  Combining  this  with  the  fact 
that  {p' :  Stop', Seq')  -**  (p :  Seq')  proves  that  p  £ 

Eighth,  consider  a  constraint  of  the  form  2  '#'^“[cond],  correspond¬ 
ing  to  a  statement  StaP  of  the  form  while  cond  do  ^Seq^.  Suppose  that 
P  €  Ica(’^^“[c<md]).  This  implies  that  p  €  P  t>cond  and  p  ^  cond. 

Hence  there  is  a  derivation  (po  :  P)  —**  (p :  Se^)  such  that  the  first  state¬ 
ment  of  Seq'  is  StaP.  Combining  tWs  with  (p :  Seq')  -»  (p :  Seq-, Seq')  proves 
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that  p  € 

Finally,  consider  a  constraint  of  the  form  f'l"  D  ^'^“[-icond]  correspond¬ 
ing  to  a  statement  Stat^  of  the  form  while  cond  do  ^Seq^ .  Suppose  that 
P  €  Ic5('f  ^“[->cond]).  This  implies  that  p  €  P  t>  ->cond  and  p  |= 

->cond.  Hence  there  is  a  derivation  (po  :  P)  -♦*  (p:  Stat°;Seq')  —*  (p :  5eg') 
and  so  p  6  In  summary  then,  we  have  show  that  Ics,  the  col¬ 

lecting  semantics  of  P,  is  a  model  of  each  of  the  constraints  in  £Cp,  the 
environment  constraints  of  P.  [] 

We  now  complete  the  proof  that  leg  corresponds  to  the  collecting  seman¬ 
tics  by  showing  that  any  model  of  the  environment  constraints  must  contain 
Ics-  We  begin  by  showing  an  important  correspondence  between  the  sets  of 
environments  associated  to  points  before  and  after  program  statements  in 
any  model  of  the  environments  constraints. 


Proposition  6  Let  X  be  a  model  of  the  environment  constraints  for  P.  If 
(po  :  P)  ^p :  Seqe\Stat°' ',Stat^ \Seq^  then  X  satisfies  '9^^  D 

Proof;  The  proof  is  by  induction  on  n.  If  n  =  0,  then  (po :  P)  — 
(jt:  Seqe\Stat^\Stat^]Seq^  implies  that  5eg,;5tat‘*;5tot^;5egj  is  P,  and 

so  Stat°‘  and  Stat^  must  be  consecutive  statements  in  P.  Hence  the  envi¬ 
ronment  constraints  for  P  include  the  constraint  D 

Now,  suppose  that  the  proposition  holds  for  n  and  consider  the  case  of 
n 1.  If  (po  :  P)  ^p :  Seq^ ; StaV*  ;Stat^  ;Seqi,^  then  there  exists  a  state 

(Pn  :  Seq^)  such  that 

(po  :  P)  {pn  :  Seq^^)  -*  (j) :  Seqe]Stat°;Stat^ ’,Seq,,^ 

Now,  if  5cg„  is  of  the  form  Se^^'^Stat^ ;Stat^ ;Seqf,  then  the  proposition 
follows  from  immediately  from  the  induction  hypothesis.  If  Seq^  is  not  of 
this  form,  then  the  first  statement  of  Seq„  must  be  an  if-then  or  while-do 
statement  whose  condition  is  satisfied  by  Pn  and  whose  body  introduces  one 
or  both  of  the  statements  Stat°  and  Stat^.  If  first{Seq„)  introduces  both 
statements,  then  Stat^  and  Stat^  are  ^ain  consecutive  statements  in  P, 
and  so  ’5'!^  D  is  a  constraint  in  the  environment  constraints  of  P. 
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On  the  other  hand,  if  first(Seq^)  introduces  only  one  of  Stat^  and  Stat^, 
then  it  must  be  the  case  that  Stat"'  is  the  last  statement  in  the  body 
of  yirst(Scg„).  If  fi^st(SeqJ^)  is  an  if-then  statement,  then  {po  :  P)  — 
(pn+i :  implies  that 

{po  :  P)  — (^pn  :  StaC  iStat^  ;Seq^  — >  (^pn :  Seq';  Stat‘d  iStat^-fSeq^ 

where  the  body  of  StaC  is  Seq';Stat°.  Now,  on  applying  the  induction  hy¬ 
pothesis  to  StaC  and  Stat^,  we  have  that  I  satisfies  3  Moreover, 
the  environment  constraints  corresponding  to  StaC  contain  the  constraint 
ijri'y  3  It  follows  that  X  must  satisfy  3 

If  first{Seq^)  is  a  while-do  statement,  then  (po :  P)  (pn+i : 
implies  that 

(po  :  P)  {pn  '  Stat^;Seq^  ->■  ^Pn+i :  Seq'^Stat^-^Stat^'ySeq^ 

where  the  body  of  Stat^  is  Seq';Stat‘^.  This  implies  that  the  environment 
constraints  contain  3  [] 

Lemma  2  Xes  is  smaller  than  any  model  of  EC p. 

Proof;  To  prove  the  lemma,  we  need  to  show  that  if  J  is  a  model  of  EC p 
then 

3  Ica{9'*)  for  all  program  points  p. 

To  prove  this,  it  suffices  to  show  the  following  two  properties 

(i)  If  (po  :  P)  -♦"*  (p :  5faf*;5eg)  then  p  G  X{^1°). 

(ii)  If  (po  :  P)  (p' :  Stat^iSeq)  -*sti  {P  •  Seq)  then  p  £ 

where  m  >  0  and  1  <  n  <  m.  These  two  properties  shall  be  proved  simulta¬ 
neously  by  induction.  The  primary  induction  shall  be  m,  with  a  secondary 
induction  on  n.  In  the  base  case  of  m  =  0,  (i)  reduces  to  po  €  and 

this  is  trivially  true  since  X  satisfies  3  T.  On  the  other  hand,  (ii)  is 
vacuously  true  since  its  preconditions  cannot  be  met  unless  m  >  1. 

Now,  suppose  that  for  some  m',  (i)  and  (ii)  hold  for  all  m  <  m',  and  we 
seek  proofs  of  (i)  and  (ii)  when  m  =  m'.  First  consider  (ii).  The  proof  for 
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this  case  employs  a  secondary  induction  on  n.  In  the  base  case  of  n  =  1,  the 
assumptions  of  (ii)  reduce  to 

{po  :  P)  (p' :  Stat°‘\Seq)  {p :  Seq). 

From  part  (i)  of  the  induction  hypothesis,  p*  £  1(^1“).  Now,  5tat“  must 
either  an  assignment  statement,  or  an  if-then  or  while-do  statement  whose 
condition  is  not  satisfied  by  p'.  If  the  first  case,  let  the  statement  be  X  :=  t. 
This  means  that  p'  [>t  and  p  is  p'[X*-^v\  where  v  is  p'{t).  Now,  I  must  satisfy 
the  constraint  ^'1“  D  and  hence  combining  this  with  p'  € 

proves  that  p  €  On  the  other  hand,  if  Stat°‘  is  an  if-then  or  while- 

do  statement  whose  condition  is  not  satisfied  by  p\  then  since  X  must  satisfy 
D  '®'l"[-icond],  it  is  again  immediate  that  p  =  p'  ^ 

Now  suppose  that,  for  some  n'  >  1,  (ii)  holds  when  m  =  m'  and  n  <  n', 
and  assume  that 

(po  :  P)  (p' :  Stat^-.Seq)  (p :  Seq). 

Again,  part  (i)  of  the  induction  hypothesis  implies  that  p'  €  Now, 

since  n'  >  I,  the  statement  Stat^  must  be  an  if-then  or  while-do  statement 
whose  condition  is  satisfied  by  p'.  Hence,  it  must  either  be  the  case  that 

(p' :  Stat°‘ ;  Seq)  {p' :  Seq' ;  Seq)  -♦5^*  (p :  Seq)  or 
(p' :  ;5eg}  -►  {p' :  Seq'^Stat"'  ;Seq)  {p :  Seq) 

where  Seq'  is  the  body  of  StaP".  Consider  these  two  possibilities  in  turn.  In 
the  first  case,  let  Stat^  is  the  last  statement  in  Seq.  Proposition  2  implies 
that  there  exists  an  environment  p"  and  integers  j  >  0  and  >  1  such  that 
j  +  k  =  n'  —  1  and 

(p' :  Seq' Seq)  (^p" :  Stat^;Seq^  (p :  Seq) . 

This  implies  that 

(po  :  P)  i^p" :  Stat^-,Seq^  (p :  Seq). 

Since  k  <  n',  part  (ii)  of  the  induction  hypothesis  implies  that  p  6  I('®^^). 
Now,  X  satisfies  the  constraint  D  and  this  proves  that  p  €  I('j?^®). 

On  the  other  hand,  if  the  derivation  has  the  second  form,  then  proposi¬ 
tion  2  can  again  be  applied,  this  time  to  show  that  there  exists  an  environ- 
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ment  p"  and  integers  j  >  0  and  fc  >  1  such  that  j  +  k  =  n'  —  1  and 
{p‘ :Seq']Stat°‘;Seq)  ->•»  {p" :  Stat°‘;Seq)  {p'Seq) 

This  implies  that 

{po :  P)  {p" :  Stat‘d  ;S€q)  {p :  Seq). 

Since  k  <  n',  part  (ii)  of  the  induction  hypothesis  implies  that  p  €  1(4'^“). 

It  remains  to  prove  the  inductive  case  for  (i).  Assume  that  {po  :  P)  — 

(p :  Stat°;Seq).  Since  m'  >  1,  there  exist  p\  Stat^  and  Seq‘  such  that 

(po  :  P)  i^p' :  Stat^;S^'^  — >  (p :  Stat^^Seq). 

Now,  consider  the  cases  of  Stat^.  If  Stat^  is  an  assignment  statement  or  an 
if-then  or  while-do  statement  whose  condition  is  not  satisfied  by  p',  then 
Seq'  must  be  Stat°‘\Seq,  and 

{po  :  P)  (j>' :  Stat^;Stat'^;Seq^  -*■  (p :  Stat°’,Seq). 

Now,  (ii)  has  just  been  proved  in  the  case  where  m  =  m',  and  so  p  € 
1(4'^^).  Furthermore,  Proposition  6  proves  that  1(4'^“)  3  1(4'^^).  Hence 
p  e  1(4-^“). 

On  the  other  hand,  if  Stat^  is  an  if-then  or  while-do  statement  whose 
condition  is  satisfied  by  p',  then  Stat^  must  be  the  first  statement  ap¬ 
pearing  in  the  body  of  Stat^.  Hence  the  environment  constraints  contain 
4rT®  2  Moreover,  (i)  can  be  applied  to  the  derivation  (po :  P)  — 
l^p'  •.  Stat^\Seq''^  to  prove  that  p  €  1(4'^^),  and  so  p  =  p'  €  This 

completes  the  induction  argument  for  (i),  and  thus  completes  the  proof  of 
the  lemma.  [] 

Theorem  2  The  collecting  semantics  Tcs  of  an  imperative  program  P  is  the 
least  model  of  the  environment  constraints  for  P. 

Proof;  PVom  Lemma  1,  Jcj  is  a  model  of  the  environment  constraints  for 
P.  Prom  Lemma  2,  Ic  is  smaller  than  all  other  models  of  the  environment 
constraints.  It  follows  that  Ics  is  exactly  the  least  model  of  the  environment 
constraints.  [] 
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Chapter  4 

Logic  Programs 


This  chapter  presents  the  background  definitions  on  operational  and  collect¬ 
ing  semantics  for  logic  programs.  Three  different  semantics  are  considered: 
top-down  execution  using  the  PROLOG  left-to-right  atom  selection  strategy, 
top-down  execution  using  a  non-deterministic  atom  sdection  strategy  (for 
modeling  certain  aspects  of  parallel  execution),  and  bottom-up  execution. 
Corresponding  to  each  operational  semantics,  a  collecting  semantics  is  given. 
In  essence,  this  involves  defining  an  appropriate  notion  of  program  point  for 
the  operational  semantics,  and  then  collecting  information  about  progreon 
executions  for  each  program  point.  The  core  part  of  the  chapter  deals  with 
constraint  based  formulations  of  the  collecting  semantics.  These  formula¬ 
tions  are  similar  in  nature  to  equational  formulations  of  collecting  semantics 
used  in  other  works  on  logic  program  analysis.  One  difference  is  that  we 
use  constraints  instead  of  equations,  and  this  leads  to  a  more  general  and 
flexible  framework. 
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4.1  Logic  Programs 

We  begin  with  some  preliminary  definitions  about  logic  programs.  Let  E 
denote  a  set  of  function  symbols,  let  11  denote  the  set  of  predicate  symbols 
and  let  VAR  be  a  denumerable  set  of  program  variables.  It  is  assumed  that 
E,  n  and  VAR  are  disjoint.  Each  function  symbol  /  €  E  and  each  predicate 
symbol  p  €  11  is  assumed  to  have  a  unique  arity.  A  function  symbol  of  arity 
0  is  called  a  constant. 

A  (logic  program)  term  is  either  a  program  variable  from  var,  or  of  the 
form  tn)  where  n  >  0,  /  is  a  function  symbol  from  E  with  arity 

n,  and  each  t,-  is  a  term.  An  atom  is  of  the  form  p{ti,. .  •  ,tn)  where  n  >  0, 
p  is  a  predicate  symbol  from  11  with  arity  n,  and  each  U  is  a  term.  Atoms 
shall  be  denoted  by  A,  B  or  C.  A  rule  is  of  the  form  Aq*-A\  , . . . ,  An  where 
each  A,-  is  an  atom.  The  atom  Aq  is  called  the  head  of  the  rule  and  the 
sequence  Ai,..  .,An  is  called  the  body  of  the  rule.  Each  A,-,  t  >  1  is  called 
a  body  atom.  In  the  case  where  n  =  0,  the  rule  is  called  a  fact.  Rules  shall 
be  denoted  by  R.  A  term,  atom  or  rule  is  ground  if  it  does  not  contain  any 
program  variables. 

A  logic  program  P  is  a  finite  set  of  rules.  Each  rule  in  a  program  P  is 
labeled  with  a  unique  integer,  called  a  rule  label.  Likewise  each  body  atom  in 
P  is  labeled  with  a  unique  integer  called  a  body  atom  label.  We  again  denote 
labels  by  a,  7  (possibly  subscripted).  Writing  R^  £  P  indicates  that  the 
rule  R  appears  in  P  with  label  a.  We  say  that  the  rule  in  P  with  label 
a  is  Ao«-A“*,. . . , An"  if  the  rule  €  P  has  head  Ao,  body  Ai,...,An 
and  body  atom  labels  ai,...,an.  Similarly  we  say  that  the  body  of  P®  is 
Aj\...,A®"  if  the  body  of  R°  £  P  is  Ai,...,An  and  the  labels  of  these 
body  atoms  are  ai,...,an  respectively.  A®  is  a  body  atom  in  P  if  a  is  a 
body  atom  label  and  A  is  the  body  atom  that  appears  in  P  with  label  a. 
A®  is  a  head  atom  in  P  if  a  is  a  rule  label  and  the  head  of  the  rule  R°  £  P 
is  A. 

A  substitution  ^  is  a  mapping  from  VAR  into  terms.  Note  that  this  defini¬ 
tion  is  somewhat  non-standard.  In  the  literature,  substitution  9  is  typically 
required  to  satisfy  9(X)  =  X  for  all  but  a  finite  number  of  variables.  How¬ 
ever,  for  our  purposes  it  is  convenient  to  drop  this  restriction.  Substitutions 
shall  be  written  in  prefix  notation.  For  example  the  result  of  applying  9  to 
X  shall  be  written  as  9(X).  A  substitution  9  can  be  extended  to  map  from 
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terms,  (or  atoms  or  rales)  into  terms  (or  atoms  or  rales,  respectively)  in  the 
usual  way: 

A  Tenanting  is  a  substitution  that  is  a  bijection  on  var.  If  0  is  a  renaming, 
then  0~^  denotes  the  substitution  that  maps  6(X)  into  X  for  all  variables 
X. 


An  environment  (or  valuation)  p  is  a  substitution  such  that,  for  each 
variable  X  €  VAR,  p(X)  is  a  ground  term  (or  value).  Although  “valuation” 
is  the  more  standard  terminology  in  the  context  of  logic  programs,  we  use 
“environment”  to  maintain  consistency  with  previous  definitions.  If  p  is  an 
environment  and  exp  is  term,  atom  or  rule,  then  p(exp)  is  a  ground  instance 
of  exp.  If  p  is  an  environment  and  0  is  a  renaming  substitution,  then  po  0 
denotes  the  environment  that  maps  X  into  p(0(X))  for  all  program  variables 
X. 


An  equation  is  of  the  form  s  =:  t  where  s  and  t  are  both  terms  or  both 
atoms.  An  equation  conjunction  £7  is  a  finite  collection  of  equations,  and  is 
written  in  the  form  si  =  #i  A  •  •  *  (the  empty  conjunction  is  denoted 

by  true).  An  environment  p  satisfies  an  equation  s  =  t  if  p{s)  and  p(t)  are 
identical  ground  terms  or  atoms.  An  environment  p  satisfies  an  equation 
conjunction  if  it  satisfies  esM;h  equation  in  the  conjunction.  We  write  p^  E 
to  denote  that  p  satisfies  E. 

Two  atoms  A  and  B  are  unifiable  if  A  =  £  is  satisfiable.  A  and  B  are 
compatible  if  A  and  0(£)  are  unifiable  for  some  renaming  substitution  0.  For 
example,  p(A’)  and  p{f(X))  are  compatible  but  not  unifiable.  Compatibil¬ 
ity  corresponds  to  unifiability  where  renaming  can  be  performed  to  avoid 
variable  name  clashes.  Where  exp^, . . . ,  ea^^,  n  >  1,  is  a  sequence  of  terms, 
atoms,  rules  or  equation  conjunctions  var(e2p2  >  *  >  ■  >  denotes  the  set  of 
all  program  variables  that  appear  in  expi , . . . ,  exp^. 


4.2  Operational  Semantics 


We  now  present  three  different  operational  semantics  for  logic  programs  - 
the  first  two  are  top-down  semantics,  and  the  last  is  a  bottom-up  semantics. 
The  motivation  for  presenting  more  than  one  operational  semantics  for  logic 
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programs  is  twofold.  First,  by  using  a  variety  of  operational  models,  we  are 
better  able  to  illustrate  the  process  of  constructing  environment  constraints 
(and,  subsequently,  set  constraints)  from  a  program.  Second,  it  provides 
some  evidence  to  support  the  wider  claim  that  set  based  analysis  is  a  general 
methodology  for  analyzing  programs  and  is  not  tied  to  a  particular  notion 
of  operational  semantics.  We  begin  with  the  top-down  definition. 

We  begin  by  describing  the  two  top-down  semantics.  First,  define  that 
a  goal  is  of  the  form  *—A\,. . . ,  An,  n  >  1,  where  each  ^4,-  is  an  atom.  Now, 
given  such  a  goal,  the  usual  definition  of  logic  program  execution  involves  re¬ 
peatedly  reducing  this  goal  using  the  rules  of  the  program.  Informally,  this 
process  can  be  described  as  follows:  given  a  goal  Go,  a  sequence  of  goals 
Go,Gi,...  is  defined  such  that  each  goal  Gi^i  in  the  sequence  is  obtained 
from  its  predecessor  Gi  by  selecting  an  atom  A  from  Gi  and  a  (renamed) 
rule  from  P,  unifying  A  and  Bq  to  obtsun  a  unifier  6,  re¬ 

placing  A  in  Gi  by  B\,...,Bt,  and  then  applying  6  to  the  result.  We  shall 
instead  adopt  a  CLP  style  formulation  of  program  semantics  [28].  This  is 
done  for  two  reasons.  First,  it  simplifies  certain  aspects  of  the  presentation, 
and  second,  it  leads  to  a  more  general  formulation  that  is  directly  applicable 
to  other  CLP  languages.  The  main  difference  in  the  CLP  approach  is  that 
to  notion  of  unification  is  replaced  by  equations  (and,  more  generally,  con¬ 
straints),  and  the  notion  of  goal  is  generalized  to  include  two  components 
-  an  equation  conjunction  and  a  sequence  of  atoms.  We  now  present  the 
details. 

An  atom  selection  function  is  a  function  that  maps  any  sequence  of 
atoms  Ai,...,An  into  an  index  t,  1  <  t  <  n.  We  say  that  Ai  is  selected 
from  Ai,...,An  and  refer  to  A,-  as  the  selected  atom.  A  (top-down)  state  is 
of  tne  form  (£ :  G)  where  £  is  a  satisfiable  equation  conjunction  and  G  is  a 
sequence  of  atoms  (the  empty  sequence  of  atoms  is  denoted  by  empty).  The 
top-down  operational  semantics  is  defined  using  a  rewrite  relation  between 
state's.  Specifically,  in  the  context  of  some  atom  selection  function,  there  is 
a  derivation  step  {E :  G)  (JS' :  G'}  if 

(i)  G  is  Ai, . . . ,  Am  and  the  atom  selection  function  maps  G  into  t; 

(ii)  the  rule  with  label  o  in  P  has  of  the  form  Bo*-B\,. . . ,  £r ; 

(iii)  ^  is  a  renaming  substitution  such  that  var{B{^R))  H  var(£,G)  =  {}; 

(iv)  G  is  Ai, . . . ,  A,'_i, 0(£i), . . . ,0(£r)>  Aj.).!, . . . ,  A,n>  and 
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(v)  E^isEA  (A.  =  0iBo)). 


Note  that,  by  definition  of  states,  the  equation  conjunction  EA{A  =  0{Bo)) 
must  be  satisfiable,  and  this  implicitly  requires  that  A  and  B(Bo)  be  unifi- 
able. 

A  derivation  P  is  a  sequence  of  derivations  steps  of  the  form 

(Eq-.Go)  (Ei'.Gi)  •••  {En'Gn)- 

We  say  that  D  is  a  derivation  from  {Eq  :  Go)  to  ‘  Gn)-  Note  that  some 
works  on  logic  programming  semantics  define  derivations  to  be  maximal 
(finite  or  infinite)  sequences  of  derivation  steps.  However,  for  our  purposes 
it  is  more  convenience  to  use  finite  derivations  and  to  omit  the  requirement 
of  maximality.  In  particular  this  means  that  any  subsequence  of  the  steps 
in  a  derivation  is  also  a  derivation. 

The  meaning  of  a  program  P  is  defined  using  the  successful  derivations, 
which  are  the  derivations  that  end  in  a  state  of  the  form  {E :  empty).  Specif¬ 
ically,  a  program  P  defines  a  function  [  •  ]p  that  maps  goals  *~G  into  sets 
of  equation  conjunctions  as  foUows: 

(«-G]p  ^  {E  :  there  is  a  derivation  from  (true  :  G)  to  {E :  empty)}. 

By  varying  the  atom  selection  function  in  the  above  definitions,  different 
operational  semantics  are  obtained.  If  the  selection  function  always  selects 
the  leftmost  atom  from  a  sequence,  then  the  usual  PROLOG  style  left-to-right 
semantics  is  obtained.  To  illustrate  this  selection  function,  consider  the  logic 
program  and  goal  in  Figure  4.1.  Consider  the  following  derivation  (in  which 
subscripts  on  derivation  steps  have  been  omitted  for  clarity)  starting  from 
the  state  (true  :  <— p(X)) 

(true  :piX))  -*  {Ei :  q{Y),r{Y))  -*  {E2  :  r(r))  {E3 :  empty) 

where  Ei,  E2  and  E3  are  given  by 

E^:  ipiX)  =  p(Y)). 

E2:  {piX)  =  p(Y)AqiY)  =  q{b)). 

Ez  :  (p(A')  =  p(y)  A  q{Y)  =  g(6)  A  r(y )  =  r(fc)) . 

This  derivation  is  a  maximal,  and  is  in  fact  the  only  maximal  derivation 
for  (true  :  *-p{X)),  modulo  variable  renaming.  Now,  consider  a  selection 


62 


CHAPTER  4.  LOGIC  PROGRAMS 


^p(X). 

p(y)^g(Y),r(r). 

g(b). 

r(a). 

r(b). 

Figure  4.1:  Program  7 


function  that  chooses  an  atom  non-deterministicaUy.  Using  such  a  selection 
function,  there  are  two  additional  maximal  derivations  from  (true  :p{X)): 

{true  :  piX))  :  ^(y ),  r(y ))  ^  {E'i :  g{Y)) ,  and 

{true  :  p{X))  ^  {E^ :  g{Y\  r(y ))  (Fj  :  g(y))  ^  {E^  :  empty) 

where  ^2  given  by 

E'^:  (p(X)=piY)Ar(Y)  =  r(a)). 

E>{:  (pW  =  p(n  A  r(y )  =  r(6)). 

We  refer  to  the  semantics  Induced  by  this  non-deterministic  atom  selection 
as  the  interleaving  semantics  since  it  allows  arbitrary  interleaving  of  the  solv¬ 
ing  of  atoms.  Although  this  semantics  is  still  sequential  in  nature,  it  does 
capture  the  essence  of  certain  aspects  of  parallel  execution  since  it  makes  no 
commitment  to  the  order  in  which  atoms  are  selected.  In  doing  so,  it  pro¬ 
vides  a  basis  for  illustrating  how  set  based  analysis  might  be  used  to  analyze 
parallel  logic  programs,  without  having  to  deal  with  the  specific  details  of  a 
parallel  logic  programming  language.  The  development  of  collecting  seman¬ 
tics  and  environment  constraints  shall  use  these  two  top-down  semantics, 
although  it  is  possible  to  accommodate  other  atom  selection  functions. 

We  now  present  the  third  operational  semantics  for  logic  programs.  A 
(bottom-up)  state  is  of  the  form  {E :  A)  where  £  is  a  satisfiable  equation  con¬ 
junction  and  i4  is  an  atom.  Such  a  state  is  bottom-up  derivable  if,  for  some 
n  >  0,  there  is  a  rule  R  of  the  form  A+-Ai,...,An  in  P,  renaming  substi¬ 
tutions  $1,  . . .,  On,  and  bottom-up  derivable  states  {Ei :  Bi) , . . . ,  {En  :  Bn) 
such  that 


•  var(R),  var(0i{Ei),Oi{Bi)),  ...,  var(0n(En),0n(Bn))  are  all  disjoint 
sets,  and 
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•  jF  is  (Ai  =  A  0t(Ei)  A  •  •  •  A  (An  =  0n(Nn))  A  0„(En). 


Thus,  for  example,  if  P  is  the  program  consisting  of  p(a)  and  g(X)*—p(X) 
then  {true  :  p(a))  and  (X  =  a:  g(X))  are  both  bottom-up  derivable  states. 

As  before,  the  meaning  of  a  program  P  is  defined  as  a  mapping  [  >  ] 
from  goals  into  equation  conjunctions.  Specifically,  each  goal  *—Ai , . . . ,  A„ 
is  mapped  into: 

r  Ai  =  0i(Bi)  A  0iiEi)  the  (Ei-.Bi) 

[♦-Ai,...,A„|p  =  <  A  •••  A  :  are  bottom-up  • 

{An  =  0n(Bn)  A  0n(En)  derivable 

where  the  0i,...,0n  are  renaming  substitutions  such  that  the  sets  var(R), 
var  ...,  var  (0n(En),0n(Pn))  are  all  disjoint. 

We  now  compare  our  definition  of  bottom-up  semantics  with  the  more 
usual  definition  of  bottom-up  semantics  that  is  based  on  the  Tp  function. 
The  Tp  function  maps  from  and  into  sets  of  ground  atoms  and  can  be  defined 
as  follows: 

In  essence,  given  a  set  5  of  "assumptions”,  Tp(S)  is  the  set  of  consequences 
that  are  derivable  in  one  step  using  P.  Hence  Tp  is  often  called  the  immediate 
consequence  operator.  Now,  Tp  is  a  continuous  function,  and  hence  it  has  a 
least  fixed  point,  denoted  lfp(Tp),  which  can  be  computed  as  the  limit  of  the 
sequence  {},Tp({}),rp(rp({})),. ...  One  of  the  main  results  of  the  standard 
semantics  of  logic  programs  is  that,  ^ven  a  program  P,  the  set  of  successful 
ground  atoms,  the  least  fixed  point  of  Tp  and  the  least  Herbrand  model  of 
P  all  coincide  (see  [7,  41]). 

Our  definition  of  bottom-up  semantics  is  more  operational  in  nature.  It 
is  structured  in  a  way  that  emphasizes  similarities  with  the  top-down  opera¬ 
tional  semantics.  This  includes  using  equations  (as  opposed  to  using  ground 
atoms)  and  defining  program  semantics  as  a  mapping  from  the  program 
goals  into  equations  (as  opposed  to  defining  program  semantics  as  a  set  of 
ground  atoms).  A  formal  correspondence  between  our  bottom-up  seman¬ 
tics  and  the  Tp  semantics  appears  in  the  discussion  following  Lemma  6  on 
page  99.  However  these  differences  between  the  two  are  not  fundamental  in 
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r(X)^p(X),g(^). 

?(/(«))• 

P(/(6)). 

p(g(a)). 

9(/(^)). 

9(S(«))* 

Figure  4.2:  Program  8 


nature. 


Note  that  the  definition  of  bottom-up  derivable  is  closely  related  to  the 
variation  of  the  Tp  function  due  to  Jaffar  and  Lassez  [28],  which  can  be 
described  as  follows.  Let  J  be  a  set  of  goals  and  defined  Tp(I)  to  be 


|/  A  (ffi(Ei)AAi  =  ei(Bi)):Ao) 


Ao*-Ai,...,An  is  a  rule  in  P 
and  {Ei  :Bi)  1  <  *  <  » 


such  that  each  6i  is  renaming  substitutions  and  the  sets  var(Ao,“‘,An), 
var(9i{Ei),0i(Bi)),  ...,  var(^„(J5„),tfn(-B„))  are  all  disjoint.  The  least 
fixed  point  of  this  function  is  exactly  the  set  of  bottom-up  derivable  states. 


4.3  Comparison  of  Operational  Semantics 

To  illustrate  the  differences  between  the  three  definitions  of  operational  se¬ 
mantics,  consider  the  logic  program  and  goal  in  Figure  4.2.  The  three  def¬ 
initions  of  semantics  given  in  the  previous  section  are  all  equivalent  in  the 
sense  that  the  equation  conjunctions  collected  for  each  goal  are  equivalent. 
For  goal  indicated  *—r(f(U)),  they  all  ^ve  the  following  set  of  equation 
conjunctions  (after  simplifying  r(f(U))  =  r(X)  into  f{U)  =  X,  etc.). 

/  /(f^)  =  XAX  =  /(a)AX  =  /(Z),  \ 

\  f{U)  =  XAX  =  f{b)AX  =  f{Z)  / 

However  the  definitions  differ  in  how  this  set  is  obtained,  and  this  has  im¬ 
portant  consequences  for  the  corresponding  collecting  semantics  as  well  as 
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for  approximation  of  these  semantics.  For  example,  the  following  is  a  valid 
derivation  in  the  top-down  interleaving  semantics 

{true:T{f(U)))  ->  {f(U)  =  X :  p(X)MX)) 

{f(U)  =  X^X  =  f{Z):p{X)) 

ifiU)  =  X  AX  =  fiZ)  AX  =  /(a) :  empty) 

whereas  it  is  not  a  valid  derivation  in  the  top-down  left-to-right  semantics. 
This  means  that  properties  of  derivations  relating  to  the  equations  encoun¬ 
tered  at  points  during  program  execution  vary  from  semantics  to  semaintics. 
Hence  the  notions  of  collecting  semantics  arising  from  the  two  top-down  se¬ 
mantics  shall  differ  significantly.  There  is  even  a  greater  distinction  between 
the  collecting  semantics  arising  from  the  bottom-up  semantics  because  the 
sets  of  program  points  used  are  different. 

One  useful  way  to  compare  all  three  semantics  is  to  consider  the  equa¬ 
tions  collected  for  each  goal  and  focus  on  the  order  in  which  the  equations 
are  collected.  Specifically,  consider  the  equation  conjunction 

f(U)  =  X  AX  =  /(a)  AX  =  f{Z) 

which  corresponds  to  the  matching  of  r{f{U))  with  r{X),  p(X)  with  p(/(a)), 
and  q{X)  with  q(f(Z)).  This  equation  conjunction  can  be  viewed  as  the 
composition  of  the  three  basic  equations  f{U)  =  X,  X  =  f{Z)  and  X  = 
/(a);  the  difference  between  the  three  semantics  is  in  the  order  in  which 
these  basic  equations  are  combined.  We  illustrate  this  in  the  following  table, 
where  parentheses  are  used  to  indicate  the  order  of  the  combination  of  basic 
equations. 


top-down 

left-to-right 

(fiU)  =  XAX  =  f{a)) 

\AX  =  f{Z) 

top-down 

interleaving 

{f(U)  =  XAX  =  fia)) 
(fiU)  =  XAX  =  f{Z) 

AX  =  f{Z) 
)AX  =  f{a) 

bottom-up 

fiU)  =  XA\ 

[x  =  nZ)AX  =  f{a)] 

1 

Note  that  although  the  resulting  equation  conjunctions  are  equivalent  in 
each  case,  since  A  is  associative  and  commutative,  there  are  often  important 
differences  when  A  is  replaced  by  some  approximate  notion  of  conjunction. 
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In  particular,  many  works  on  program  analysis  can  be  understood  by  replac¬ 
ing  the  operation  of  A  by  some  conservative  approximation  of  A,  and  this 
new  operation  is  in  many  cases  not  associative  or  commutative,  and  hence 
the  approximations  of  a  program  induced  by  the  three  different  semantics 
often  diflFer.  We  have,  for  simplicity,  ignored  the  treatment  of  disjunction. 
However  the  basic  observation  about  analysis  of  lo^c  programs  can  be  gen¬ 
eralized  as  follows:  exact  operations  are  replaced  by  approximate  operations, 
and  since  the  algebraic  properties  of  the  underlying  operations  (such  as  asso¬ 
ciativity  and  commutative  of  conjunction  and  disjunction,  and  distributivity 
of  conjunction  and  disjunction)  rarely  hold  for  their  approximate  counter¬ 
parts,  the  approximations  induced  by  the  various  semantics  do  not  usually 
coincide. 


4.4  Collecting  Semantics 


In  program  analysis,  we  are  not  primarily  interested  in  the  result  of  a  compu¬ 
tation,  but  rather  in  what  happens  during  the  computation.  In  the  context 
of  logic  programs,  what  we  desire  is  information  about  the  equations  that 
arise  at  various  points  during  program  execution.  A  collecting  semantics 
formalizes  this  notion  by  explicitly  collecting  the  set  of  equations  encoun¬ 
tered  at  each  program  point.  In  other  words,  the  collecting  semantics  serves 
to  make  explicit  information  that  is  already  implicit  in  the  operational  se¬ 
mantics. 

For  the  two  top-down  semantics,  we  shall  define  collecting  semantics  in 
the  context  of  an  initial  goal  *-Go.  As  for  the  imperative  program  case,  this 
represents  a  choice.  We  could,  for  example,  define  the  collecting  semantics 
using  a  set  of  initial  goals,  or  perhaps  all  goals.  In  the  context  of  logic 
program  analysis,  the  use  of  a  restricted  set  of  initial  goals  seems  most 
appropriate,  and  we  have  used  a  single  initial  goal  mainly  for  presentational 
simplicity.  We  note  that  it  is  straightforward  to  extend  the  set  based  analysis 
of  logic  programs  to  deal  ruth  a  set  of  initial  goals  that  is  either  finite  or  can 
be  described  using  regular  term  grammars.  In  what  follows,  it  shall  often 
be  convenient  to  treat  the  initial  goal  as  part  of  the  program,  and  treat  the 
goal  itself  as  a  rule  without  a  head.  In  particular,  the  atoms  in  the  initial 
goal  shall  be  referred  to  as  body  atoms. 

We  begin  by  discussing  appropriate  notions  of  program  point  for  logic 
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programs.  Recall  that  each  rule  and  body  atom  in  a  program  has  a  unique 
label.  We  also  assume  that  the  initial  goal  (if  any)  is  labeled  with  a  rule 
label  and  that  each  atom  of  the  initial  goal  is  labeled  with  a  body  atom 
label.  These  labels  shall  be  used  to  denote  program  points  as  follows.  A 
rule  label  shall  indicate  the  execution  state  just  after  the  rule  has  finished 
executing,  or  in  other  words,  just  after  every  body  atom  has  been  solved.  A 
body  atom  label  indicates  the  state  just  before  execution  of  the  body  atom, 
or  in  other  words,  just  before  the  body  atom  is  selected.  In  essence,  this 
definition  is  just  a  formalization  of  the  notion  of  program  point  employed 
in  the  introduction  (Chapter  3).  However,  there  is  a  small  difference  in  the 
details.  In  particular,  the  use  of  textual  markers  (g),  etc.,  is  somewhat 
inconvenient  for  dealing  with  programs  in  a  uniform  way,  and  so  we  have 
chosen  to  use  labels  attached  to  program  atoms.  As  an  example,  where 
previously  we  may  have  written  p(f(X,Y))  <—  ®,  q(A',y),  (g)  to  indicate 
that  @  denotes  the  point  just  before  execution  of  q{X,Y)  and  (§>  indicates 
the  point  just  after  the  execution  of  the  rule  body,  we  now  write  the  rule 

2.  pUiX,Y))  ^  qiX,Yy 

where  the  atom  label  1  indicates  the  point  just  before  the  execution  of 
q{X,Y),  and  the  rule  label  2  (which  labels  the  entire  rule)  indicates  the 
point  just  after  the  execution  of  the  rule  body. 

We  also  observe  that  the  notion  of  program  point  captured  in  the  above 
formalization  represents  a  choice  among  many  possible  approaches.  It  is 
possible  to  consider  more  elaborate  notions  of  program  point  that  take  into 
account  the  context  in  which  an  atom  is  "called”.  For  example,  consider 
extending  the  notion  of  "program  point”  so  that  it  includes  an  additional 
label  that  indicates  an  atom’s  "parent”.  The  issue  of  choosing  a  notion  of 
program  point  arises  in  any  approach  to  program  analysis,  and  is  largely 
orthogonal  to  the  details  of  set  based  analysis.  We  have  chosen  a  notion  of 
program  point  that  is  simple,  intuitive  and  has  proven  to  be  useful. 

Note  that  a  notion  of  program  point  that  is  appropriate  for  one  oper¬ 
ational  model  may  not  be  particularly  appropriate  for  another  operational 
model.  For  example,  the  formalization  used  here  is  appropriate  for  a  top- 
down  left-to-right  operational  model  because  the  atoms  in  the  body  of  a 
riile  are  selected  in  left-to-right  order  (and  so  the  atoms  can  be  thought  of 
as  successive  statements  or  procedure  calls).  It  is  mairginally  less  appro¬ 
priate  for  the  top-down  interleaving  model  because  in  this  case  there  is  no 
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notion  of  order  between  body  atoms  and  the  execution  of  body  atoms  can  be 
interleaved.  It  is  even  less  appropriate  for  the  bottom-up  operational  model 
because  in  this  model  there  is  no  notion  of  the  state  “before”  a  body  atom  is 
selected.  (In  fact  the  collecting  semantics  corresponding  to  the  bottom-up 
semantics  shall  completely  ignore  the  program  points  corresponding  to  body 
atom  labels). 

Before  defining  the  collecting  semantics,  we  need  some  preliminary  def¬ 
initions.  First,  we  extend  the  operational  semantics  to  take  into  account 
program  labels.  This  involves  using  labeled  atoms  in  states,  and  refining 
the  definition  of  derivation  step  so  that  when  the  body  of  a  rule  is  inserted 
into  a  goal,  the  labels  on  the  body  atoms  are  retained.  Specifically,  let 
the  rule  used  in  a  derivation  step  be  , . . . ,  -4“",  and  let  the  renam¬ 

ing  used  is  0,  then  the  new  atoms  introduced  by  the  deri.ation  step  are 

Now,  consider  a  derivation  V  and  suppose  that  V  contains  a  step  of  the 
form 

{E:G)  (£':(?')• 

Let  G  have  the  form  Ax,..., Am  and  let  the  rule  in  P  with  label  a  be 
, . . . , J3®’’ .  The  goal  G*  contains  the  atoms  ^ ), . . . 9{Bf^ )  that 
do  not  appear  in  G.  These  new  atoms  are  said  to  be  introduced  using  9.  We 
also  define  that  each  of  the  atoms  9{Bx),...9{Br)  is  called  a  child  of  the 
atom  Ai  selected  from  G.  The  transitive  closure  of  the  child  relation  is  used 
to  define  an  descendant  relation.  Specifically,  an  atom  A°  is  a  descendant  of 
an  atom  B^  if  either  A°‘  is  a  child  of  B^  or  else  i4“  is  a  child  of  a  descendant 
of 

A  derivation  V  solves  an  atom  A®  if  A°  is  selected  at  some  step  in  the 
derivation  and  all  of  the  descendants  of  A®  are  solved  in  subsequent  steps  of 
V.  A  derivation  V  minimally  solves  an  atom  A®  if  A®  is  solved  by  V  but  is 
not  solved  by  the  derivation  consisting  of  all  but  the  last  step  of  P.  In  other 
words.  A®  is  minimally  solved  in  P  if  (i)  A®  is  selected  from  the  state 
in  P,  (ii)  the  last  state  in  P  does  not  contain  any  descendants  of  A®,  and 
(iii)  all  states  between  the  j*^  state  and  the  last  state  contain  a  descendant 
of  A®.  Intuitively,  this  occurs  when  the  subproof  of  A®  (contained  in  P) 
is  completed  by  the  last  step  of  P.  Strictly  speaking,  the  notions  we  have 
just  defined  require  that  atoms  in  a  derivation  to  be  labeled  with  auxiliary 
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information  such  as  the  derivation  step  where  they  were  introduced.  Clearly 
this  can  be  done,  and  we  omit  the  details. 

The  collecting  semantics  is  defined  using  the  class  of  derivations  whose 
first  state  is  (true  :  Gq).  Let  V  be  such  a  derivation.  V  is  said  to  select  A° 
under  renaming  6  if  A°  is  the  atom  selected  from  the  last  state  in  V  and 
A^  is  introduced  using  the  renaming  9.  In  the  case  where  the  selected  atom 
i4“  is  an  atom  from  the  starting  state  {true :  Gq),  V  is  said  to  select 
under  Bij  where  is  the  identity  renaming  substitution.  The  derivation  2> 
returns  from  the  rule  ii“  under  renaming  6  if  some  step  of  2?  is  of  the  form 
{Ea  :  Ga)  v.a.g^  (^4  *  and  the  atoms  introduced  by  this  step  are  solved 
in  the  subsequent  steps  of  V  and  one  of  the  introduced  atoms  is  minimally 
solved  in  the  subsequent  steps  of  V.  In  other  words,  the  last  descendzmt  of 
the  atoms  introduced  by  {Ea  :  Go)  ^  „  »>  {Ei, :  Gb)  is  solved  in  the  last  step 
of  V.  V  returns  from  Gq  under  renaming  Bu  if  the  last  state  of  V  has  the 
form  {E  :  empty)  and  a  is  the  label  of  Gq. 

We  can  now  present  the  collecting  semantics  corresponding  to  the  top- 
down  and  bottom-up  operational  models.  In  each  case  this  consists  of  a 
mapping  CSp  from  (some  subset  of)  the  program  points  into  sets  of  equation 
conjunction.  We  begin  with  the  top-down  collecting  semantics.  The  follow¬ 
ing  definition  is  parameterized  by  the  atom  selection  function;  it  serves  to 
define  both  the  top-down  left-to-right  collecting  semantics  and  the  top-down 
interleaving  semantics. 


Definition  3  Given  an  initial  goal  *-Gq,  the  top-doum  collecting  semantics 
of  a  logic  program  P  is  the  mapping  CSp  such  that 


CSp{a) 

CSpiP) 


there  is  a  derivation  from  {true  :  Gq)  to 
{E :  G)  that  selects  A°  under  9 

there  is  a  derivation  from  {true  :  Go)  to 
{E :  G)  that  returns  from  under  B 


where  a  ranges  over  all  body  atom  labels  and  /3  ranges  over  all  rule  labels. 

D 


Crucial  to  this  definition  is  the  specific  formulation  of  “selects”  and  “returns 
from”.  In  particular,  the  definition  of  “selects”  refers  only  to  the  last  state 
of  the  derivation.  In  other  words,  a  derivation  selects  A°‘  under  9  if  is 
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the  atom  selected  from  the  last  state  of  the  derivation.  This  means  that 
the  equation  conjunction  at  the  end  of  the  derivation  is  the  conjimction 
encountered  just  as  A"  is  about  to  start  execution.  Importantly,  if  an  atom 
A^  is  selected  from  some  state  other  than  the  last  state  in  a  derivation 
T>,  then  by  considering  the  sequence  of  derivation  steps  up  to  the  point 
that  A“  is  selected,  we  can  construct  another  definition  V*  such  that  A®  is 
selected  from  the  last  state  in  P'.  Similarly  the  definition  of  “returns  from” 
refers  to  rule  uses  that  are  completed  during  the  last  step  of  the  derivation. 
That  is,  a  derivation  returns  from  fZ®  under  6  if  the  solving  of  the  atoms 
introduced  by  the  indicated  use  iZ®  is  completed  during  the  last  step  of  the 
derivation.  Again,  this  is  done  so  that  the  equation  conjunction  at  the  end 
of  the  derivation  is  the  conjunction  encoimtered  just  after  execution  of  the 
rule  has  been  completed. 


Definition  4  The  bottom-up  collecting  semantics  of  a  logic  progmm  P  is 
the  mapping  CSp  such  that,  for  each  rule  iZ®  xoith  body  Ai, . . . ,  An, 


CSp(ot) 


'  <?i(f;i)A(Aa  =  «x(Bi)) 

^  W)A(An  =  <?n(5„)) 


{E\  :  Bi) , . . . ,  {Bn  •  Bn) 

are  bottom-up  derivable 


where  0i,...,6n  are  such  that  var(9{R)),  var{Ei,Bi),  ...,  var{En,Bn)  are 
disjoint  sets  of  variables.  [] 


To  illustrate  the  differences  between  these  different  collecting  semantics, 
consider  the  logic  program  in  Figure  4.3.  An  immediate  difference  between 
the  bottom-up  and  top-down  collecting  semantics  is  that  the  bottom-up 
collecting  semantics  does  not  say  anything  about  program  points  1  and 
2.  The  two  top-down  semantics  differ  at  points  1  and  2.  Note  that  for 
point  6,  each  collecting  semantics  collects  the  singleton  set  consisting  of  the 
equation  conjunction  W  =  f{X)  A  IF  =  f{Z)  (strictly  speaking,  the  top- 
down  interleaved  semantics  also  collects  W  =  f{Z)  A  IF  =  f{X),  but  since 
A  is  commutative  we  consider  this  equation  conjunction  to  be  identical  to 
W  =  fiX)  A  W  =  f{Z)). 
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3.  ^p{W)\  q{Wf. 

4.  p(/(X)). 

5.  Ti9{y))‘ 

6.  qim). 


program 

Top-Down 

Top-Down 

point 

left-to-right 

interleaving 

Bottom-Up 

1 

{true} 

f  trve,  1 

1  W  =  f(Z)  } 

- 

2 

II  l> 

3  S 

< 

f  true,  ) 

W  =  f{X),  [ 
W  =  g{Y)  J 

- 

6 

f  W^f{X)  \ 
\AW  =  f{Z)  / 

1 

W  =  f{X)  \ 

A  W  =  /(Z)  / 

r  w=f(x)  \ 

\  A  JT  =  /(Z)  / 

4,  5,6 

{true} 

{true} 

(true) 

Figure  4.3:  Program  9  and  Its  Collecting  Semantics 


72 


CHAPTER  4.  LOGIC  PROGRAMS 


4.5  Environment  Constraints 


The  main  focus  of  this  thesis  is  on  analysis  involving  the  possible  values 
that  each  program  variable  may  assume  (this  kind  of  analysis  is  often  called 
typing  analysis).  However,  set  based  analysis  is  by  no  means  restricted  to 
such  analysis.  For  example,  mode  analysis  and  sharing  analysis  can  also  be 
performed  using  set  based  analysis  techniques  (see  Chapter  9).  The  main 
reason  for  focusing  on  the  values  of  program  variables  is  that  it  enables  a 
clearer  presentation  of  the  concepts  of  set  based  analysis.  In  particular,  it 
enables  program  semantics  to  be  characterized  using  environments  instead 
of  the  more  general  notion  of  equations.  In  turn,  this  allows  a  fairly  simple 
characterization  of  the  collecting  semantics  using  environment  constraints. 

The  environment  constraints  used  for  logic  programs  are  closely  related 
to  those  used  for  imperative  programs.  As  before,  the  operations  in  the 
environment  constraints  are  intimately  connected  to  the  operators  of  the 
underlying  semantics.  However,  since  the  operators  involved  in  the  seman¬ 
tics  of  logic  programs  differ  from  those  for  imperative  semamtics,  we  require 
some  new  environment  constraint  operations. 

An  environment  variable  is  a  variable  that  ranges  over  sets  of  environ¬ 
ments,  aud  shall  be  denoted  by  the  symbol  '9.  For  eatch  program  point  a, 
there  is  a  distinguished  environment  variable  denoted  9°,  whose  purpose  is 
to  describe  the  environments  corresponding  to  point  a.  An  environment  ex¬ 
pression  is  an  expression  of  the  form  [Ai  G  Bi.9i,...  ,An  E  Hn^9„],  where 
the  Ai  and  Bi  are  atoms  amd  the  are  environment  variables.  An  environ¬ 
ment  constraint  is  of  the  form  9  D  ee  where  9  is  am  environment  variable 
and  ee  is  an  environment  expression. 

The  meaning  of  environment  constraints  is  defined  in  the  context  of  am 
interpretation  I  that  maps  each  environment  variable  into  a  set  of  environ¬ 
ments.  If  ee  is  [A\  €  Bi.9i,...,An  €  Hn-'i'n]  then  J(ee)  is  defined  by 

I(ee)  **=  {p  :  p(A,)  €  {p\Bi) :  p'  €  1(4’),},  i  =  l-n}. 

To  explain  this  definition,  first  note  that  eaudi  expression  essentially 

represents  a  collection  of  ground  atoms  under  an  interpretation  J,  as  foUows 

J{Bi.9i)  =  {p'(Bi):p'El(9i)}. 
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Now,  each  element  Ai  6  Bi.^i  can  be  thought  of  as  a  condition  on  environ¬ 
ments  p  that  holds  whenever  p(Ai)  is  contsuned  in  the  set  of  ground  atoms 
specified  by  5, That  is,  Ai  €  Bi.^i  is  satisfied  by  an  environment  p  un¬ 
der  interpretation  I  if  p(Ai)  6  X{Bi.'9i).  Finally,  the  meaning  of  the  entire 
expression  is  just  the  set  of  environments  that  satisfy  all  of  these  conditions: 

I([Ai  €  Bi.^u...,A„  e  =  {p:p{Ai)eXiBi9i),  i  =  l..n}. 

Note  that  if  n  =  0  then  X{[Ai  €  6  Bn-'J'n])  simply  reduces 

to  the  set  of  all  environments.  Now,  an  interpretation  J  is  a  model  of  a 
constraint  ^  D  ee  \i  X{^)  D  X{ee).  An  interpretation  is  a  model  of  a 
collection  of  environment  constraints  if  it  is  a  model  of  each  constraint  in 
the  coUection. 

We  now  present  the  environment  constraints  corresponding  to  each  col¬ 
lecting  semantics.  In  each  case,  P  is  a  program,  and  we  seek  environment 
constraints  £Cp  such  that  the  least  model  of  £Cp  coincides  with  CSp. 


Definition  5  (Top-Down  Left-to-Right  Environment  Constraints) 
For  each  rule  R°‘  ^  P  with  body  and  head  Ao  (if  it  exists), 

€Cp  contains  the  constraints 

D  Ao  € 

2  [Ao€Po.4'^,Ai€Pi-»^*,] 


2  [Ao  €  Po.'*^'*",Ai  e  Pi.4'^»,...,A„_i  € 

9°  2  [Ao€Po.«''*“,Ai€Pi.^^‘,...,A„€Pn.»^"] 

where  0o  ranges  over  body  atom  labels  such  that  B^  is  a  body  atom  in  P 
and  Ao  and  Bq  are  compatible,  and  the  Pi,  t  >  1,  range  over  rule  labels  such 
that  Bi'  is  a  head  atom  in  P  and  Ai  and  Bi  are  compatible.  [] 


If  the  rule  P  is  a  goal,  then  Ao  does  not  exist,  and  the  entry  Ao  €  Po-'^^ 
is  simply  deleted  from  each  environment  expression.  For  example,  the  con¬ 
straint  D  [Ao  €  Po.5'^]  becomes  D  [  ]. 

Intuitively,  the  top-down  left-to-right  constraints  each  have  the  form 
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'SI?“>+»  2  [i4o  €  Bo.9^,Ai  €  Bi.9^\...,Aj  € 

where  the  first  entry  Aq  6  Bq.^^  corresponds  to  the  “caJling”  of  the  rule 
a  via  the  body  atom  B^ ,  and  the  remaining  entries  Ai  €  B\ ^...,Aj  € 
Bj.^^i  correspond  to  the  “solving”  of  Ai, .  ..,Aj  via  the  rules  fix,  In 

other  words,  in  the  least  model  of  the  constraints,  the  environments  encoun¬ 
tered  just  before  body  atom  Aj+i  are  those  such  that  the  rule  is  “called”, 
and  all  of  the  atoms  to  the  left  of  Aj+i  have  been  solved.  Figure  4.4  illus¬ 
trates  the  construction  of  top-down  environment  constraints  for  Program  9, 
whereas  Figure  4.5  gives  the  top-down  constraints  for  a  slightly  more  com¬ 
plex  program  involving  recursion. 

The  environment  constraints  for  the  interleaved  semantics  are  similar 
to  those  for  the  left-to-right.  The  main  difference  is  that  when  considering 
a  body  atom  A,-  in  a  rule  Ao*-Ai,...,An,  the  entries  in  the  environment 
constraints  corresponding  to  the  solving  of  the  body  atoms  to  the  left  of  A,- 
are  omitted  because  the  interleaved  semantics  specifies  that  goal  atoms  may 
be  selected  in  any  order. 

Definition  6  (Top-Down  Interleaving  Environment  Constraints) 
For  each  rule  with  body  A®*,..., A®"  and  head  Ao  (if  it  exists),  SCp 
contains  the  constraints: 

2  [>lo  €  Bq.^^]  ,  j  =  l..n 
«®  2  Ao  €  Bo.»^,Ai  €  6 

where  Pq  ranges  over  body  atom  labels  such  that  B^  is  a  body  atom  in  P 
and  Ao  and  Bq  are  compatible,  and  the  Pi,  i>  1,  range  over  rule  labels  such 
that  Bf’  is  a  head  atom  in  P  and  A,-  and  Bi  are  compatible.  [] 

Again,  if  is  a  goal,  then  the  entry  Ao  €  Bo.9^  is  simply  deleted.  Fig¬ 
ure  4.6  illustrates  the  construction  of  the  enviroiunent  constraints  for  the 
interleaved  semantics  using  Program  10.  Note  that  in  the  case  of  Program  9 
(Figure  4.4),  the  constraints  for  the  interleaved  semantics  are  the  same  as 
those  for  the  top-down  left-to-right  semantics  since  the  rules  in  Program  9 
have  less  than  two  body  atoms. 

The  environment  constraints  for  the  bottom-up  semantics  are  essentially 
stripped  down  versions  of  the  top-down  constraints.  In  particular,  all  entries 
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3.  --p{W)\  q{W)\ 

4.  pifiX)). 

5-  MY)). 

6.  qifiZ)). 


D  [] 

2  \p(W)€pifiX)).9*] 

D  \p(W)£MY)).9^] 

2  \p(W)  e  p(fiX)).9\q(W)  e  q{fiZ)).9^] 
2  \p(W)eMY)).9^qiW)eqif(Z)).^^] 
2  [p{/W)epW.4'*] 

2  b(i7(y))ep(H^).4''] 

$6  3  [q(fiz))  6 


Figure  4.4:  Program  9  and  Its  Top-Down  Left-To-Right  Constraints 


2.  *-loop{a.b.nil,  V)^ 

3.  loop{W.ml,W). 

5.  loop{XX,Y)^loop{L,Y)*. 


«’  2  (I 

'iP*  2  [/oop(a.6.m/,  V)  € /oop(W.n»Z,W^).’l'^] 

2  [/oop(c.6.mZ,V)€/«^XI,r).*®] 

^3  2  [loop(W.nil,W)  e  loop{a.b.nil,V).9'] 

4-3  2  [Zoop(W:ntZ,W^)€Zoop(2;,y).’4<] 

2  [1<MX.L,Y)  €  /oop(o.6.n»7,F).'4^] 

4'^  3  [/oop(XI,y)  €  /oop(i'T).«'^] 

4^®  3  [Zoop(X2/,y)  €  Zoop(a.6.ntZ,F),’4,*  loop{L,Y)  £  loop{W.nil,W).'9^] 
4'®  3  [loop(X.L,Y)  £  loop(a.b.nU,V).%^  loop{L,Y)  £  loop{X.L,Y).9^] 
4^®  3  [/oop(X.i, y)  € /oop(I, y).4’^  loop{L,Y)  £  loop{W.nil,W).'9^] 

4'®  3  [/oop(A-.I,y)  €  loopiL,Y).%*  loop{L,Y)  £  loop{X.L,Y).^^] 


Figure  4.5:  Program  10  and  Its  Top-Down  Left-To-Right  Constraints 
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3.  ^p{W)\q{W)\ 

4.  p(/(X)). 

5.  P{9{Y)). 

6.  9(/(Z)). 


D  [] 

D  [] 

$3  D  [p(vr)  €  p(/(J^)).'»^9(W^)  6  g(/(^)).«'«] 

2  [p(/(A-))  € 

2  [p(g(y))€p(vr).«*] 

D  [q{m)€q{W).^^] 


Figure  4.6:  Program  9  and  Its  Top-Down  Interleaved  Constraints 


3.  -piW)\q{Wf. 

4.  p(/(X)). 

5.  P(g(y)). 

6.  qU{Z)). 


3  [p(W)  €  p(/(X)).4'^g(W^)  €  g(/(Z)).’^«] 
4-3  3  [p(H^)  €  p(p(y)).'^5  ,(W^)  e  g(/(Z)).'^«] 
4r4  D  [] 

4'5  D  [] 

'f®  D  [] 


Figure  4.7:  Program  9  and  Its  Bottom-Up  Constraints 


dealing  with  the  “calling”  of  the  rule  are  deleted.  This  means  that  the  only 
relevant  program  points  are  those  corresponding  to  rules  and  goals. 


Definition  7  (Bottom-Up  Environment  Constraints) 

For  each  rule  with  body  i4"*,...,i4"",  €Cp  contains  the  constraints: 

D  [^1  €  Bi.4'^‘,...,An  € 

where  the  t  >  1,  range  over  rule  labels  such  that  Bf'  is  a  head  atom  in 
P  and  Ai  and  Bi  are  compatible.  [] 

Figures  4.7  «  od  4.8  illustrate  the  construction  of  the  bottom-up  constraints 
for  programs  9  and  10. 

We  now  address  the  correctness  of  the  environment  constraints,  which  es¬ 
sentially  states  that  the  least  model  of  a  program’s  environment  constraints 
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D  [loop(a.b.nil,V)  €  loop{W.nil,W).ii^] 

2.  ^loop(a.b.nil,Vy  D  [loop{a.b.nU,V)  €  loop(X.L,Y).^^] 

3.  loop(W.nil,W). 

5.  loop{X.L,Y)^loop(L,Y)*.  2  [loop{L,Y)  e  loop{W.nil,W).9^] 

'^5  3  [loop{L,Y)  €  /oop(A'.X,y).$5] 

Figure  4.8:  Program  10  and  Its  Bottom-Up  Constraints 


corresponds  to  the  program’s  collecting  semantics.  To  formalize  this  state¬ 
ment,  first  recall  that  environment  constraints  focus  on  the  run-time  values 
of  program  variables  (as  opposed  to  other  run-time  properties  such  as  vari¬ 
able  instantiation,  aliasing  or  sharing  between  variables),  and  that  these 
values  are  captured  using  sets  of  environments.  In  essence,  the  environment 
constraints  of  a  program  are  written  to  capture  minimal  consistency  condi¬ 
tions  (with  respect  to  the  operational  semantics  at  hand)  between  sets  of 
environments  associated  with  neighboring  points  in  the  program.  The  least 
model  of  these  constraints  defines  the  smallest  (or  most  accurate)  consistent 
assignment  of  environments  to  program  points. 

In  contrast,  the  collecting  semantics  of  a  program  defines  a  set  of  equa¬ 
tion  conjunctions  for  each  program  point.  These  equation  conjunctions  im¬ 
plicitly  contain  information  about  the  possible  values  of  program  variables, 
in  addition  to  other  information.  The  information  about  variable  values  can 
be  made  explicit  by  considering  the  environments  that  satisfy  the  equation 
conjunctions.  Specifically,  for  each  program  point  a,  the  collecting  semsmtics 
CSp  defines  a  set  of  environments 

{p'.  E  and  E  €  C5p(q)}. 

In  essence,  this  set  of  environments  is  the  same  as  the  set  of  environ¬ 
ments  associated  with  a  by  the  least  model  of  the  environment  constraints. 
However,  for  technical  reasons,  the  correspondence  is  not  exact.  Rather,  the 
sets  are  equivalent  only  in  the  context  of  the  program  variables  relevant  to 
the  point  a.  Specifically,  define  vor(a),  the  set  of  variables  relevant  to  point 
a,  is  defined  as  follows.  If  a  is  a  rule  label,  then  vor(o)  =  var{R)  where  R  is 
the  rule  in  P  with  label  a.  If  q  is  a  body  atom  label,  then  var(a)  =  var{R) 
where  R  is  the  rule  in  P  that  contrins  the  body  atom  with  label  a.  Now, 
where  var  is  a  set  of  program  variables  and  p  and  p'  are  environments,  define 
that  p  =var  p'  if  p(-X’)  =  p{X)  for  each  X  €  var.  Also,  where  5  and  S'  are 
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sets  of  environments,  define  that  5  =var  S*  if,  for  every  environment  p  €  5, 
there  is  an  environment  p‘  e  S'  such  that  p  =„or  P^  and  vice-versa.  Finally, 
the  correctness  of  the  environment  constrmnts  can  be  stated  as  follows. 


Theorem  3  (Environment  Constraint  Correctness) 

The  following  correspondence  holds  under  the  top-down  left-to-right  seman¬ 
tics,  the  top-down  interleaving  semantics  and  the  bottom-up  semantics: 

lm{£Cp){9°)  =„ar(a)  {p  '  P  ^  E  and  E  €  C<Sp(q)},  for  all  points  a. 

u 

The  proof  of  this  theorem  is  contained  in  the  next  section.  Again  the  proof 
is  lengthy  and  tedious  and  is  included  mainly  for  the  sake  of  completeness. 
Note  that  since  environment  constraints  reason  at  the  ground  level  (that 
is,  they  use  environments  as  opposed  to  substitutions  or  constraints),  an 
important  component  of  this  proof  involves  proving  that  reasoning  at  the 
ground  level  is  adequate  for  the  purposes  of  determining  the  possible  run¬ 
time  values  of  program  variables.  In  contrast,  reasoning  at  the  ground  level 
is  not  adequate  for  determining  other  run-time  properties  such  as  variable 
instantiation,  aliasing  or  sharing. 

4.6  Environment  Constraint  Correctness 


We  now  prove  the  correctness  of  the  environment  constraints  for  the  top- 
down  left-to-right  and  interleaved  semantics  as  well  as  for  the  bottom-up 
semantics  (Theorem  3).  This  involves  establishing  an  equivalence  between  a 
program’s  collecting  semantics  and  its  environment  constraints.  In  essence, 
the  definition  of  collecting  semantics  and  the  environment  constraints  differ 
in  two  respects.  First,  they  deal  with  different  objects  -  the  collecting  se¬ 
mantics  deals  with  equation  conjunctions  and  the  environment  constraints 
deal  with  environments.  Second,  the  structures  of  the  definitions  are  dif¬ 
ferent  -  the  collecting  semantics  is  based  on  a  rewrite  relation,  whereas 
the  environment  constraints  use  constraints  that  express  local  consistency 
conditions.  The  proof  of  correctness  of  the  environment  constraints  is  corre¬ 
spondingly  in  two  parts.  We  begin  by  defining  a  ground  collecting  semantics 
that  uses  environments  instead  of  equation  conjunctions.  The  first  part  of 
the  correctness  proof  then  establishes  a  connection  between  the  (original) 
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collecting  semantics  and  the  ground  collecting  semantics,  and  the  second 
step  relates  the  ground  collecting  semantics  with  environment  constraints. 
We  begin  with  the  top-down  environment  constraints. 


4.6.1  Correctness  of  Top-Down  Constraints 

In  essence,  the  states  of  the  ground  semantics  are  constructed  from  states 
that  do  not  contain  any  program  variables.  Now,  the  equation  conjunction 
of  such  a  state  is  just  a  conjunction  of  equations  between  ground  expres¬ 
sions,  and  these  are  simply  true  (since  the  equation  conjunction  in  a  state  is 
required  to  be  satisfiable),  and  hence  can  be  omitted.  Therefore,  the  states 
of  the  ground  semantics  are  just  sequences  of  ground  atoms,  that  is,  ground 
goals  (we  shall  often  abuse  notation  and  omit  the  ”  part  of  a  goal).  In  the 
context  of  some  atom  selection  function,  we  define  that  there  is  a  derivation 
step  G  .  ^  G\  where  G  and  G'  are  ground  goals,  if 

(i)  G  is  ^4“ ‘ , . . . ,  and  the  atom  selection  function  maps  G  into  i; 

(ii)  the  rule  with  label  a  in  P  is  , . . . ,  Pf'; 

(iii)  p  is  an  environment  such  that  Ai  =  p{Bo),  and 

(iv)  G'is  . 


A  ground  derivation  is  a  sequence  of  derivations  steps  of  the  form 


Go 


>  Gi 


•n» Pf% 


>  Gn. 


Let  P  be  a  ground  derivation  and  suppose  that  P  contains  a  ground  deriva¬ 
tion  step  of  the  form  G  ^  G'.  Let  G  have  the  form  Ai,...,Am  and 

let  R  be  Bo*—Bi, . . . , P,.  Now,  the  ground  goal  G*  contains  the  new  atoms 
p(Pi),.  ..p(Pr),  which  do  not  appear  in  G.  Each  new  atom  p(Pj)  is  called 
a  child  of  the  atom  A,-  selected  from  G.  An  atom  A“  is  a  descendant  of  an 
atom  P^  if  either  A°  is  a  child  of  P^  or  else  A®  is  a  child  of  a  descendant 
of  P^. 


A  ground  derivation  P  solxxs  an  atom  A“  if  A“  is  selected  at  some  step 
in  P  and  all  of  the  descendants  of  A®  are  solved  in  subsequent  steps  of  P. 
A  ground  derivation  P  minimally  solves  an  atom  A®  if  A®  is  solved  by  P 
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but  is  not  solved  by  the  ground  derivation  consisting  of  all  but  the  last  step 
of  2?, 

The  ground  collecting  semantics  is  deilned  using  the  class  of  ground 
derivations  whose  first  goal  is  a  ground  instance  of  the  initial  goal  Gq.  Let 
V  be  such  a  derivation.  V  introduces  A^  at  step  k  using  environment  p  if 
either  (a)  the  step  of  P  IS  C"  ^  ^  and  is  an  aton^  that  appears 

in  G'  but  not  in  G,  or  else  (b)  k  =  0,  the  first  goal  of  V  is  p{Go)  and 
appears  in  p(Gq).  Similarly,  V  uses  rule  and  environment  p  at  step  k 
if  either  (a)  the  k*^  step  of  P  is  G  G'  amd  R  is  the  rule  in  P  with 

label  a,  or  else  (b)  k  =  0,  the  first  goal  of  P  is  p(Go)  and  the  label  of  Go 
is  a.  P  is  said  to  select  A®  under  renaming  0  if  A“  is  the  atom  selected 
from  Gn  and  A“  is  introduced  using  p.  The  derivation  P  returns  from  the 
rule  A®  under  environment  p  if,  for  some  A;  >  0,  the  atoms  introduced  by 
step  k  are  solved  in  the  subsequent  steps  of  P,  one  of  the  introduced  atoms 
is  minimally  solved  in  the  subsequent  steps  of  P  and  P  uses  iZ®  (a  rule  or 
goal)  and  environment  p  at  step  k. 

Now,  in  analogy  to  the  collecting  semantics  CS  defined  using  derivations, 
a  ground  collecting  semantics  can  be  defined  using  ground  derivations. 


Definition  8 

Given  an  initial  goal  Go,  the  top-down  ground  collecting  semantics  of  a  logic 
program  P  is  the  mapping  GCS  such  that 


gCSpia)  y  : 
gCSpi/3)  : 


there  is  a  ground  derivation  from  Gq  that 
selects  A®  under  p 

there  is  a  ground  derivation  from  Gq  that 
returns  from  R^  under  p 


where  a  ranges  over  all  body  atom  labels,  0  ranges  over  all  rule  labels  and 
Gq  ranges  over  ground  instances  of  Gq.  [] 


We  now  prove  the  CSp  and  GCSp  are  essentially  equivalent.  We  be¬ 
gin  by  proving  two  important  connections  between  derivations  and  ground 
derivations.  These  connections  shall  only  hold  if  the  atom  selection  function 
satisfies  the  following  property:  the  atom  selection  function  maps  Ai, . . . ,  An 
into  t  iff  it  maps  d(Ai), . . .  ,0(An)  into  t,  where  8  is  an  arbitrary  substitu¬ 
tion.  In  other  words,  we  require  that  the  atom  selection  function  selects 
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atoms  only  on  the  basis  of  their  place  in  the  sequence  and  their  predicate 
symbol.  This  is  clearly  the  case  for  the  selection  function  used  in  the  top- 
down  left-to-right  semantics  (which  always  selects  the  leftmost  atom).  In 
an  intuitive  sense,  it  also  holds  for  the  atom  selection  function  used  in  the 
interleaved  semantics,  since  in  this  case  the  atom  selection  function  non- 
deterministicaUy  chooses  one  of  the  atoms  from  the  goal.  However,  since 
the  application  of  the  atom  selection  function  to  a  goal  is  not  uniquely  de¬ 
fined,  the  property  is  perhaps  more  accurately  stated  as:  the  atom  selection 
function  may  map  Ai,...,An  into  t  iff  it  may  map  0{Ai), . . .  ,0{An)  into 
i,  where  0  is  an  arbitrary  substitution.  Before  stating  the  next  proposition, 
we  recall  that  the  composition  p  o  0  of  an  environment  p  with  a  renaming 
substitution  0  yields  the  environment  that  maps  each  program  variable  X 
into  p{0{X)). 

Proposition  7  Let  D  be  a  derivation  of  length  n  ending  in  state  {E:G). 
If  p  ^  E  then  there  exists  a  ground  derivation  T>'  such  that,  for  j  =  l..n, 
the  steps  of  V  and  V  are  respectively 

(Ea'-Ga)  {Eb-.Gb)  and  p(Ga)  -7.— 

Proof;  The  proof  is  by  induction  on  the  length  of  V.  If  D  has  length  0, 
then  it  consists  of  a  single  state,  call  it  (E :  G).  If  p  \=  E,  then  V'  can  be 
defined  to  be  the  single  ground  goal  p{G),  and  this  completes  the  base  case. 

Now,  suppose  that  the  proposition  holds  for  derivations  of  length  n,  and 
consider  a  derivation  of  the  form 

<Eo:Go)  -  —5;^  {En:Gn)  ^  {E:G). 

Suppose  that  p  ^  E.  This  implies  that  p  )=  and  by  the  induction 
hypothesis,  it  follows  that  there  exists  a  ground  derivation 

such  that  each  pi  is  po  0,-.  It  remains  to  show  that  there  is  a  derivation  step 
P{Gn)  j, piG).  Let  Gn  be  Ai,...,Am,  and  let  the  rule  in  P  with 

label  a  be  ...,Br.  Since  (£„  : G„)  .  ^  b>  {E  :  G),  we  have 

~  Ai, . . .  ,  Ai—i,0(^Bi),  • . . ,  y4,'4.i, . . . ,  Am 

=  En  A  =  <>(Ho)) 


G 

E 
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Hence,  the  difference  between  p{Gn)  and  p{G)  is  that  p{Ai)  in  p(G„)  is  re¬ 
placed  by  p{9{Bi)), . . .,p($(Bn))  in  p{G).  Now,  since  p^  E,it  follows  that 
p{Ai)  =  p(e{Bo)).  Hence  pi6{R))  is  equal  to />(i4j)<-p(d(5i)), . . .,p{e(Bn)), 
and  it  follows  that  p(Gn)  piG).  [] 

Proposition  8  Let  D'  be  a  ground  derivation  of  length  n  starting  from 
ground  goal  Gq.  If  Gq  is  a  sequence  of  atoms  such  that  Gq  is  a  ground 
instance  of  Go,  then  there  exists  a  derivation  V  from  {true  :  Go)  to  {En  :  Gn) 
and  an  environment  pn  ^  En  such  that,  for  j  =  l..n,  the  steps  ofV  and 
V  are  respectively 

:  Ga)  {Ei  :  Gb)  and  p{Ga)  p{Gb) 

such  that  p  =var(a)  Pn  <>  9. 

Proof:  The  proof  is  by  induction  on  the  length  of  P'.  If  P'  has  length  0, 
then  it  consists  of  a  single  ground  goal,  call  it  G.  If  Ctq  is  a  sequence  of  atoms 
such  that  G'o  is  a  ground  instance  of  Go,  then  there  exists  an  enviroiunent  p 
such  that  G'o  is  p(G„).  Clearly  p  |=  true,  and  so  assigning  P  to  be  derivation 
consisting  of  the  single  state  {true  '.Go)  completes  the  base  case. 

Now,  suppose  that  the  proposition  holds  for  derivations  of  length  n,  and 
consider  a  ground  derivation  of  the  form 

^0  •  •  •  jn.an:;,n>  ' 

Suppose  that  Go  is  a  sequence  of  atoms  such  that  Qq  is  p(Go).  By  the 
induction  hypothesis,  there  exists  an  environment  p'  and  a  derivation 

{Eo'.Go)  *’*  {En’.Gn). 

such  that  p'  En,  G'  =  p'{Gj),  and  pj  =wir(o,)  p’  ®  where  j  =  l,.n. 
It  remains  to  show  that  there  is  an  appropriate  derivation  step  of  the 
form  {En  :  Gn)  {E :  G).  Now,  let  Gn  be  A\,..  .,Am,  so  that  GJ,  is 

p'{A\), . .  .,p\Am)’  Also  let  the  rule  in  P  with  label  a  be  Bo*-B\,.  ..,Br. 
Since  G'^  G', 

G'  =  p'{Ai),...,p\Ai.i),p{Bi),...,p{Br),p'{Ai+i),...,p'{Am). 

Now,  let  61  be  a  renaming  substitution  such  that  var(0(i2))  n  var{E,  G)  = 
{},  and  define  G  and  E  as  follows 
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G  =  Ai,...  ■^<+1 )  •  •  •  »  Afn 

E  =  En^{Ai  =  ^{BQ)). 


This  defines  the  derivation  27,  and  it  remains  to  show  that  there  exists  an 
appropriate  environment  p"  such  that  27  and  27'  have  the  necessary  relation¬ 
ship.  Define  p"  as  follows: 


{  p\X)  ifX€uar(£;„,Gn) 
I  p{6~^(X))  otherwise. 


Since  var{Eo,Go)  C  var(EifGi)  C  •••  C  var{En,Gn),  it  follows  that 
p"{Gj)  =  p'iGj),  j  =  l..n.  Hence,  p''(Gj)  =  Gj,  j  =  l..n.  Also,  p*'{0(Bi))  = 
p{9~^(0(Bi)))  =  p(Bi),  I  =  l..r,  and  so  p"(G)  is 


P\Ai), . . . ,  p  (A,‘_i),p(Hi), . . .  ,p(J9r),p  . . .  ,p  (Am) 

and  hence  p"((7)  =  G\  In  summary,  the  steps  of  T>  and  P'  have  the 
respective  forms 


{Ea:G^) 


^  (Ek-.Gi)  and  p»{Gt>) 


»i.Pi 


and  so  it  only  remains  to  show  that  pj  =„ar(ai)  ®  1  ^  i  ^ 

Now,  for  j  <  n,  X  £  var{aj)  implies  that  6jiX)  €  var{En,Gn),  and  so 
p"  o  0j(X)  =  p"{6j{X))  =  p\9j{X))  =  Pj{X)  (tlds  last  step  follows  from 
Pj  =var(aj)  p' «  ^i)-  Finally,  in  the  case  where  j  =  n  -|- 1,  pj  =var(aj)  p"  °  ^j 
reduces  to  p  =var(a)  P'*  ®  which  follows  immediately  from  the  defmition  of 

D 


Using  these  propositions,  the  following  correspondence  between  CSp  and 
GCSp  can  be  established. 


Lemma  3  Let  CSp  and  GCSp  be  the  collecting  semantics  and  ground  col¬ 
lecting  semantics  for  an  initial  goal  Gq.  Then,  for  each  program  point  a, 

GCSp{a)  =„ar(a)  {p  '  P  \=  E  for  some  E  €  C<Sp(a)}  (4.3) 


Proof:  The  proof  proceeds  in  two  parts,  according  to  whether  a  is  a  rule 
label  or  body  label.  First  consider  the  case  where  a  is  rule  label,  and  let  R 
be  the  rule  in  P  with  label  a.  If  p  €  GCSp(a)  then  there  exists  a  ground 
instance  Gq  of  the  initial  goal  Gq,  and  a  ground  derivation  V'  from  Gq  that 
returns  from  R^  under  p.  Let  n  be  the  length  of  V.  Clearly  there  must 


84 


CHAPTER  4.  LOGIC  PROGRAMS 


be  some  k,  k  <  n  such  that  the  k*^  step  of  2?'  is  of  the  form  GJ,  ^  ^  Gj 
such  that  each  atom  introduced  during  this  step  is  solved  in  V'  and  one  of 
the  atoms  introduced  is  minimally  solved.  Now,  Proposition  8  implies  that 
there  exists  a  derivation  V  from  (true  :  Go)  to  {En  :  Gn)  Juid  an  environment 
p*  such  that  the  k*^  step  of  V  is  {Ea  :  G*)  ;  {Et :  Gj),  p'  o6  =„ar(a) 
and  V  solves  the  atoms  introduced  by  the  k^^  step,  and  minimally  solves 
one  of  them.  Hence  D  returns  from  R°‘  imder  0  and  so  0~^(En)  €  CSp(a). 
Now,  p'  ^  En,  and  so  p'o d  ^  0~^(E).  It  follows  that  p* o 6  is  dm  element  of 
the  set  on  the  right  hand  side  of  (4.3). 

Conversely,  if  p^  E  for  some  E  6  CSp(a)  then  there  exists  a  derivation 
T>  from  (true :  Go)  to  {En :  Gn)  that  returns  from  R°‘  under  0  and  E  is 
From  p  \=  E  and  the  fact  that  E  is  (£?„)»  it  follows  that 
p  o  0~^  ^  En-  Again,  let  n  be  the  length  of  V  and  let  k  be  such  that  the 
k*^  step  of  V  is  of  the  form  {Ea  :  Ga)  ;  {E\, :  Gh)  and  each  body  atom 

introduced  during  this  step  is  solved  in  2>',  and  one  of  them  is  minimally 
solved.  Now,  proposition  7  implies  that  there  exists  a  derivation  V  from 
p(Go)  to  p{Gn)  such  that  the  k*^  step  of  V  is  p{Ga)  .  ^  -»  p{Gh)  where 

p'  =var{a)  P  °  o  ^  solves  the  atoms  introduced  by  this  step  and 

minimally  solves  one  of  them.  Hence  jy  returns  from  R°‘  under  p  and  so 
p  €  CSp{a). 

The  second  part  of  the  proof  deals  with  the  case  where  o  is  the  label  of 
a  body  atom  in  P;  the  proof  here  closely  parallels  that  for  the  first  part.  If 
p  €  QCSp{a)  then  there  exists  a  ground  instance  Gg  of  the  initial  goal  Go, 
and  a  ground  derivation  V'  from  Gg  to  GJ,  that  selects  under  p.  That  is, 
the  atom  selection  function  maps  G^  into  t  and  the  element  of  G'n  is  A”* 
such  that  A°‘  is  introduced  using  p.  Now,  Proposition  8  implies  that  there 
exists  a  derivation  V  from  (true :  Go)  to  {En :  G„)  and  an  environment  / 
such  that  the  element  of  Gn  is  of  the  form  B°  and  is  introduced  using 
renaming  0  where  p'  o0  =„ar{a)  P'  assumption  that  substitutions  do 

not  affect  the  operation  of  the  atom  selection  function,  the  atom  of  Gn  is 
selected.  This  implies  that  V  selects  under  0  and  so  0~^{En)  6  CSp{a). 
Since  p'  1=  En,  it  follows  that  p'  o  0  ^  0~^{En)-  Hence  p'  o  ^  is  an  element 
of  the  set  on  the  right  hand  side  of  (4.3). 

Conversely,  suppose  that  p\=  E  for  some  E  €  CSp{a).  Then  there  exists 
a  derivation  T>  from  {Eo  :  Gg)  to  {En  :  Gn)  that  selects  an  atom  of  the  form 
A°  under  0  such  that  E  is  0~^{En)-  That  is,  the  atom  selection  function 
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maps  Gn  into  t  and  the  element  of  Gn  is  A°  and  this  atom  is  introduced 
using  0.  Since  p  \=  0~^{En)  it  follows  that  p  o  0~^  ^  En.  Proposition  7 
implies  that  there  exists  a  derivation  from  p  o  0~'^{Go)  to  p  o  6~^{Gn) 
such  that  the  t‘^  element  of  p((3n)  is  introduced  using  p  o  6~^  o  0.  By  the 
assumption  on  the  atom  selection  function,  the  atom  of  p{Gn)  is  selected. 
This  implies  that  V  selects  p(A“)  under  p  and  so  p  €  C«Si>(Q).  [] 

To  complete  the  correctness  proof  of  the  environment  constraints,  we 
shall  now  relate  QCSp  with  the  least  model  of  the  environment  constraints. 
Specifically,  where  ^gca  denotes  the  interpretation  that  maps  each  into 
QCSp(a),  we  show  that  Xgcs  =  lm(£Cp).  The  proof  of  this  consists  of  two 
parts.  The  first  part  proves  that  Xgca  is  a  model  of  €Cp.  The  second  part 
proves  that,  for  any  model  I  of  CCp.,  Tgca  ^  T. 

We  begin  by  proving  the  following  proposition  on  combining  parts  of 
derivations.  Note  that  this  proposition  only  holds  for  atom  selection  rules 
satisfying  the  following  criteria:  if  an  atom  A  is  selected  from  a  ground  goal 
G,  then  there  exists  a  ground  derivation  starting  from  G  that  solves  A  before 
selecting  any  of  the  other  atoms  in  G.  In  general,  this  condition  may  not 
be  satisfied,  and  this  will  mean  that  the  lemma  will  not  hold.  In  this  case, 
environment  constrsdnts  can  be  used  to  obtain  a  conservative  approximation 
of  the  collecting  semantics,  but  cannot  to  be  used  to  characterize  it  exactly. 
However  in  the  case  of  the  left-to-right  and  interleaving  selection  functions, 
the  criteria  is  satisfied,  and  the  environment  constraints  correspond  exactly 
to  the  collecting  semantics. 

Proposition  9  If  V  is  a  ground  derivation  from  G  to  G'  such  that  A  is 
selected  from  G'  and  V  is  a  ground  derivation  that  solves  A,  then  there 
exists  a  ground  derivation  V”  from  G  to  G"  that  minimally  solves  A  where 
G"  is  the  result  of  deleting  A  from  G' . 

Proof  (for  left-to- right  semantics):  Let  G'he  Ai,A2,...,An.  The  proof 
uses  the  ground  derivation  V  to  construct  a  ground  derivation  V"  from  G 
to  A2,..  .,  An.  In  particular,  we  shall  construct  a  derivation  V"  of  the  form 

Go  ^2  j3.03>3^  "  ’  ■ 

where  each  goal  Gi  in  this  derivation  has  the  form  5ef,-,  ^2, . . . ,  such  that 
Seqi  contains  exactly  the  set  of  descendants  of  Ai  that  appear  in  G,.  That  is, 
(a)  if  Ai  or  any  of  its  descendants  appear  in  Gi,  then  they  appear  in  Seqi,  and 
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(b)  any  atom  in  Seqi  is  either  Ai  or  a  descendant  of  Ai.  The  construction 
of  V"  proceeds  as  follows.  The  initial  ground  goal  Go  is  just  G  and  it  is 
clear  that  if  SeqQ  is  set  to  be  Ai  then  conditions  (a)  and  (b)  are  satished. 
Now,  suppose  that  can  be  written  as  Seqi,A2,.-‘,An  such  that  (a) 
and  (b)  are  satisfied,  and  consider  defining  the  t**  step  of  V".  ff  •Se5,_i  is 
empty  then  the  construction  of  V"  is  complete.  Otherwise,  let  Seqi_i  be  of 
the  form  that  is,  B  is  the  first  atom  in  the  sequence  and 

Seqi_i  contains  all  but  the  first  atom  in  5eq,-_j.  Since  the  selection  function 
at  hand  is  the  leftmost  selection  function,  B  is  selected  from  Gi-i.  By  part 
(b)  of  the  invariant,  B  is  either  Ai  or  a  descendant  of  Ai.  Now,  since  Ai 
is  solved  in  P',  it  must  be  the  case  that  A\  and  all  of  its  descendants  are 
selected  at  some  step  in  V.  Hence  P'  must  contain  a  step  of  the  form 
i.a.p  ^  such  that  the  rule  with  label  a  has  the  form  Co*-Cu  ••■•.Cr 
where  p{Cq)  =  B.  Now,  define  the  i**  of  P"  to  be  G,_i  ,  ^  ^  >  G,-  where 
Gi  is  p(Gi),...,/)(Cr),Sc5j_i,i42,...,yln«  Clearly  (a)  and  (b)  are  satisfied 
when  Seqi  is  p(Ci), . . .  ,p(Gr),5c5j_2,  and  this  completes  the  description  of 
the  procedure  to  construct  V". 

Eventually  this  procedure  must  reach  a  point  such  that  Gi  does  not 
contain  Ax  or  any  of  the  descendants  of  Ax.  This  is  because  the  number 
of  steps  in  the  construction  of  P"  is  bounded  by  the  number  of  steps  in 
P'.  Hence  eventually  Seqi  is  empty,  and  this  yields  a  derivation  G  -** 
j42,  . . . ,  An,  which  is  just  G  with  the  selected  atom  Ax  deleted.  [] 

Proof  (for  interleaving  semantics):  Again  let  G'  be  Ai, A2,> • . , An. 
The  main  difference  between  this  proof  and  the  given  above  for  the  left- 
to-right  semantics  is  that  any  of  the  atoms  A,-  may  be  selected.  Suppose 
that  the  atom  selection  function  selects  A,-  from  G\  Then,  we  need  to  con¬ 
struct  a  ground  derivation  V"  from  G  to  Ai , . . . ,  A,'_i ,  A,-.|.i , . . . ,  An*  Again, 
this  construction  is  guided  by  V\  The  only  difference  is  that  this  time 
the  constructed  ground  derivation  V"  consists  of  ground  goals  of  the  form 
Ax,..., Ai-x , Seqi, A,>i , . . . , An.  The  construction  process  is  a  straightfor¬ 
ward  adaptation  of  that  for  the  left-to-iight  case.  [] 

Lemma  4  Xges  is  a  model  of  €Cp. 

Proof  (for  lefl-to-right  semantics):  Let  Go  be  the  initial  goal,  and 
consider  a  constraint  in  €Cp.  Now,  this  constraint  could  either  correspond 
to  (a)  a  rule  in  P  or  (b)  the  initial  goal  Go.  We  consider  these  cases  in  turn. 
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Suppose  that  the  constraint  corresponds  to  a  nile  R  with  label  On+i,  and 
let  R  be  of  the  form  .  Mn"*  Then  the  constraint  must  be  of  the 

form 

2  [A)  €  , Aj  €  (4.4) 

where  0  <  j  <  n,  /3o  is  a  body  atom  label  such  that  B^  is  a  body  atom  in 
P  and  Aq  and  Bq  are  compatible,  and  the  i  =  l..j,  are  rule  labels  such 
that  Bf*  is  a  head  atom  in  P  and  At  and  Bi  are  compatible.  Now,  suppose 
that  p is  an  element  of Iga  ([.4o  €  Bo.'9^,Ai  €  Bi £  Bj.^^ffj. 

This  means  that  (a)  piAo)  €  IgdBo.m^)  and  (b)  p(A,)  e  I,c*(B,.^^‘), 
i  =  1..J. 

From  (a),  there  exists  a  p'  €  Igd^^)  such  that  p(j4o)  =  p\Bo).  By  defi¬ 
nition  of  p'  €  Xgca{^^ ),  there  exists  a  ground  instance  Gq  of  Go  and  a  ground 
derivation  V  from  G'o  to  G  such  that  G  is  of  the  form  p'(Bo),Ci,..  .,C„. 
Now,  let  V  be  the  ground  derivation  that  combines  V  with  the  following 
addition  ground  derivation  step 

P  (Bo),  Aj, . . . ,  •  •  »p('^n)>^l>  •  •  • 

Clearly  27  is  a  ground  derivation  from  (?(,  to  p(>li), . . .  ,p(>l„), Ci, . . . , Cm- 

From  (b),  there  exist  pi  €  Xgcsi^^')  such  that  p{Ai)  =  pi{Bi),  i  = 
l..j.  By  definition  of  p,-  €  Igesi^^'),  Pi  is  such  that  there  is  a  ground 
derivation  Vi  from  some  ground  instance  of  Go  such  that  27,-  minimally  solves 
Pt(B,). 

In  summary,  there  exists  a  ground  derivation  V'  from  Gq  (a  ground 
instance  of  the  initial  goal  Go)  to  p(i4i),...,p(i4n),Ci,...,Cm,  and,  for 
each  i,  there  exist  derivations  27,-  that  minimally  solve  each  p(A,-).  Now, 
Proposition  9  can  be  applied  to  combine  P'  with  Pi  to  produce  a  ground 
derivation  PJ  from  Qo  to  p(-^2)>***»p(^n)>C’i,...,Cm.  Proposition  9  can 
again  be  applied,  this  time  to  V{  and  P3  to  produce  a  ground  derivation 
from  Qo  to  piAz), . . .  ,p(v4n),Ci, . . .  ,Cto*  Repeating  this  process  proves  that 
there  is  a  ground  derivation  P^  from  Gq  to  p(j4j+i ), . . .  ,p(j4„), Ci, . . . , Cm- 
Now,  if  j  <  n  then  P'-  selects  p(Aj+i)  under  p  and  so  p  G  Ijc«(^“^'*'*)- 
On  the  other  hand,  if  j  =  n,  then  Pj  returns  from  J2®"+*  under  p  and  so 
p  €  Tgca{oin-n)-  Hence  in  either  case  the  constraint  is  satisfied. 
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Now  suppose  that  the  constraint  corresponds  to  the  initial  goal  Qq.  The 
argument  that  such  a  constraint  is  satisfied  by  Tgct  is  essentially  a  repeat  of 
the  argument  for  constraints  corresponding  to  rules.  Let  Gq  be  of  the  form 
A"* , . . . ,  An” ,  ajad  let  the  label  of  this  goal  be  .  Then  the  constraint 
must  be  of  the  form 

2  [Ai  € 

where  1  <  i  <  n,  the  /3,-,  i  =  l..j,  are  rule  labels  such  that  Bf'  is  a 
head  atom  in  P  and  A,-  and  Bi  are  compatible.  Now,  suppose  that  p  is  an 
element  ofXgca  ([Ai  €  Bi.9^^,...,Aj  €  This  means  that  p(A,)  e 

Tgcs(Bi.9^'),  i  =  l..j.  Hence,  there  exist  p,-  €  Xga{^^')  such  that  p(A,)  = 
Pi(Bi),  i  =  l..j.  By  definition  of  p,-  €  Pi  such  that  there 

is  a  ground  derivation  D,-  from  some  ground  instance  of  Go  such  that  Vi 
minimally  solves  Pi(Bi).  Moreover,  p(Ai),...,p(A„)  is  an  instance  of  Go, 
and  hence  there  is  a  derivation  (of  length  0)  from  an  instance  of  Go  to 
p(Ai), . . .  ,p(An). 

In  summary,  there  exists  a  ground  derivation  P'  from  some  instance  of 
Go  to  p(Ai),. . .  ,p(An),  and,  for  t  =  l..j,  there  exists  a  derivation  Vi  that 
minimally  solves  p(A,).  Again,  proposition  9  can  be  repeatedly  applied  to 
show  that  there  is  a  ground  derivation  V'j  from  Gq  to  p(Aj+i ),..., p(A„). 
Now,  if  j  <  n  then  Vj  selects  p(Aj+i)  under  p  and  so  p 
On  the  other  hand,  if  j  =  n,  then  Vj  minimally  solves  Go  under  p  and  so 
P  €  Tgcs{an+i)’  Hence  in  either  case  the  constraint  is  satisfied.  [] 

Proof  (for  interleaving  semantics):  The  proof  for  interleaving  semantics 
closely  follows  the  structure  of  the  proof  for  left-to-right  semantics.  Let  Go 
be  the  initial  goal,  and  consider  a  constraint  in  €Cp.  Now,  this  constraint 
could  either  correspond  to  (a)  a  rule  in  P  or  (b)  the  initial  goal  Go.  We 
consider  these  cases  in  turn.  Suppose  that  the  constradnt  corresponds  to  a 
rule  R  with  label  On+i,  and  let  R  be  of  the  form  Ao<— Af’,-  ■  •  Then 

the  constraint  must  have  one  of  the  following  two  forms 

2  [AoeBo.«'^] 

2  [Ao€Bo.<P^,Ai€Bi.«'^S.,.,A„eB„.'»^] 

where  1  <  i  <  n,  /3o  is  a  body  atom  label  such  that  B^  is  a  body  atom  in 
P  and  Ao  and  Bq  are  compatible,  and  the  0i,  *  =  l..n,  are  rule  labels  such 
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that  Bf'  is  a  head  atom  in  P  and  Ai  and  Bi  are  compatible. 

Consider  a  constraint  of  the  first  form,  and  suppose  that  p  €  H{Ao  € 
Then  there  exists  an  environment  p'  €  such  that 

p{Aq)  =  p\Bq).  By  definition  of  p'  e  there  eMsts  a  ground 

instance  Gg  of  Go  and  a  ground  derivation  V  from  Gg  to  G  such  that  G 
selects  p'(B^)  via  p.  Let  G  be  Ci,...,Cm»  it  must  be  the  case  that 
there  exists  an  t  <  m  such  that  Ci  is  p{Ag).  Hence,  T>  can  be  extended, 
using  the  rule  j4o<— with  environment  p,  into  a  derivation  V 
from  Gg  to  Ci,...,Cj_i,p(v4f‘),...,p(j4“"),(7,+i,...,CTO.  Since  the  atom 
selection  function  of  the  interleaved  semantics  may  select  any  of  the  atoms 
from  this  goal,  it  follows  that  p  €  j  —  l..n. 

Now  consider  a  constraint  of  the  second  form,  and  suppose  that  p  € 
X([Ag  6  Bo.^^^Ai  €  ,.  ..yAn  €  This  means  that  (a) 

p(^o)  €  JgcsiBg.^^)  and  (b)  p{Ai)  €  Reasoning  as 

before,  (a)  implies  that  there  exists  a  ground  derivation  V  from  Gg  to  G 
such  that  Gg  is  an  instance  of  Go  and  G  is  of  the  form  Ci, . .  .,Cm  where, 
for  some  i,  I  <  i  <  rn,  Ci  is  p(Ao).  Qearly  V  can  be  extended  to  give  a 
ground  derivation  V  from  Gg  to 

Cl,  •  •  •  ,Ci_i,/j(i4j  *), . . .  ,p(i4*'*),Ci4.i,. . . ,  Cm- 

Also,  (b)  implies  that  there  exist  p,-  €  Xges(^^’)  such  that  p(A,)  =  Pi(Bi), 
i  =  l..n.  By  definition  of  p,-  Pi  is  such  that  there  is  a  ground 

derivation  Vi  from  some  ground  instance  of  Co  such  that  Vi  minimally 
solves  Pi{Bi),  and  hence  minimally  solves  p{Ai).  The  derivations  V  and 
Vj  can  be  combined  using  proposition  9  (see  the  proof  for  the  left-to- 
right  semantics  for  more  details)  to  obtain  a  derivation  V"  from  Gg  to 
Cl, ... ,  C,_i ,  C,+i , . . . ,  Cm-  Clearly  V”  returns  from  R“"+*  under  p  and 

so  p  €  Tjc»(®n+l)- 

Now  suppose  that  the  constraint  corresponds  to  the  initial  goal  Qg.  Let 
Go  he  of  the  form  A®",  and  let  the  label  of  this  goal  be  a„+i. 

Then  the  constraint  must  have  one  of  the  following  two  forms 

2  [] 

2  [a,  €5i.'i'^‘,...,A„6  5„.«''’»] 
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where  1  <  j  <  »  and  the  yS,-,  i  =  l..n,  are  rule  labels  such  that  Sf’  is  a 
head  atom  in  P  and  Ai  and  J3,  are  compatible.  The  argument  that  such 
a  constraint  is  satisfied  by  Tgcs  is  essentially  a  repeat  of  the  argument  for 
constraints  corresponding  to  program  rules.  The  main  observation  is  that, 
for  any  environment  p,  there  is  a  single  step  derivation  D  consisting  of  the 
goal  p(Af*), . . .  This  implies  that  p  €  Xgcs('^°0  since  any  atom  in 

the  goal  p(i4“* ),..., p(i4"")  may  be  selected  in  the  interleaved  semantics. 
Moreover,  if  p  €  [Ai  €  Bi.9^^,...,An  €  then  we  can  show  that 

there  exist  derivations  Vj,  j  =  l..n,  such  that  Vj  solves  p{Aj),  and  then 
proposition  9  can  be  applied  to  combine  D  with  the  Dj  to  prove  that  p  € 

D 

Lemma  5  I/I  is  a  model  of  SCp  then  Igc,  C  I. 

Proof  (for  left-to-right  semantics):  Let  Go  be  the  initial  goal  and  let  I 
be  a  model  of  SCp.  To  prove  the  lemma,  we  need  to  show  that  if  p  6 
then  p  €  where  a  ranges  over  all  program  labels.  Recalling  the 

definition  of  GCSp,  this  can  be  reduced  to  the  Mowing  property:  if  I>  is  a 
ground  derivation  from  some  ground  instance  Gq  of  Go  then 


(a)  if  V  selects  A°  under  p  then  p  €  and 

(b)  if  V  returns  from  under  p  then  p  € 


We  shall  prove  this  using  an  induction  argument  whose  hypothesis  is: 
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Let  P  be  a  ground  derivation  and  suppose  that  the  sequence 
of  goals  in  P  is  Cjq,  (jj , . . . ,  Off  where  Gq  is  a  ground  instance 
of  Go.  If,  for  some  n,  0  <  n  <  it  is  the  case  that 

(0)  all  atoms  A^'  in  that  are  solved  by  V  are  such  that 
A  €  for  some  head  atom  in  P, 

then  the  following  conditions  hold  for  all  X;  <  n: 

(1)  if  P  selects  d®  and  P  introduces  at  step  k  using  p 

then  p  e  and 

(2)  if  P  uses  rule  R^  and  environment  p  at  step  k  and  the 
atoms  introduced  by  step  k  are  solved  in  the  subsequent 
steps  of  P  then  p  € 

In  essence,  parts  (1)  and  (2)  of  this  hypothesis  are  a  restricted  form  of  (a) 
and  (b).  Note  that  in  the  case  where  n  =  N,  part  (0)  of  the  induction 
hypothesis  becomes  vacuously  true,  and  parts  (1)  and  (2)  are  equivalent  to 
(a)  and  (b).  Hence,  if  n  =  JV,  then  the  induction  hypothesis  is  equivalent  to 
the  lemma. 

We  prove  the  hypothesis  by  induction  on  n.  In  the  base  case  n  =  0  and 
G'n  is  an  instance  of  the  initial  goal  Go-  Let  Go  have  the  form  , . . . ,  A"'' 
and  let  the  label  of  Go  be  Or+i.  Then  G{,  has  the  form  p(y4“* ),..., p(i4“'') 
where  p  is  some  environment.  Assume  that  (0)  holds,  and  consider  (1)  and 
(2).  Since  fc  <  0,  the  only  possible  value  for  k  is  0. 

To  prove  (1),  suppose  that  P  selects  and  P  introduces  at  step 
0  using  p'.  This  means  that  A^  must  appear  in  Gq,  and  so  there  exists  a 
j  such  that  A®  is  p{A^*).  Now,  P  follows  a  left-to-right  execution  strategy 
and  so  it  is  easy  to  verify  that  P  must  solve  A®‘ , , . . ,  .  This  is  because 

if  Gn  contains  any  descendants  of  Af * , . . . ,  Ajij*  then  these  descendants 
must  appear  to  the  left  of  p(A“^)  and  hence  p{A°’)  could  not  be  selected 
from  Gs’  Since  P  solves  Af >  assumption  (0)  can  be  applied  to 
prove  that  for  all  t,  1  <  t  <  j,  there  exists  a  head  atom  Sf'  in  P  such  that 
p{Ap)  €  Now,  corresponding  to  Go,  SCp  contains  the  constraint 

2  [Ai  6  Bi  .4'^* , . . . ,  Aj_j  €  Bj-i  ] . 
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Clearly  p  6  T{[A\  €  and  since  I  is  a 

model  of  ECp,  it  follows  that  p  € 


To  prove  (2),  suppose  that  V  uses  rule  R°  and  environment  p  at  step  0 
and  the  atoms  introduced  by  step  0  are  solved  in  the  subsequent  steps  of  V. 
This  implies  that  Vn  is  empty  and  that  all  atoms  in  Gq  are  solved.  Arguing 
as  above,  assumption  (0)  can  be  used  to  show  that  for  all  t,  1  <  t  <  m, 
there  exists  a  head  atom  fif*  in  P  such  that  p(Af‘)  6  Since  £Cp 

contains  the  constraint 

D  [Ai  €  Bx  , . . . ,  A,  €  • 

and  it  follows  that  p  € 


To  prove  the  inductive  case,  suppose  that  for  some  n'  the  hypothesis 
holds  for  n  =  n'  —  1,  and  we  seek  to  prove  that  the  hypothesis  when  n  =  n'. 
Assume  that  (0)  holds.  Let  the  step  from  Gn~i  to  Gn  be  of  the  form 


. 


where  the  rule  in  P  with  label  Or+i  is  Ao^-Af* , . .  • ,  A®*"  such  that  p(Ao)  = 
Cl.  Now,  consider  the  derivation  consisting  of  the  first  n  —  1  steps  of 
V,  Clearly  X>'  is  a  derivation  from  Gq  to  Gn-i  that  selects  C7*.  Now,  the 
hypothesis  is  assumed  to  hold  for  n  —  nf  —  l,  and  when  applied  to  derivation 
V',  condition  (0)  is  vacuous  and  so  (1)  implies  that  p*  e  where  p’  is 

the  environment  used  to  introduce  C^*  •  Let  CJ  be  the  body  atom  in  P  with 
label  7i.  Then  p(Ao)  =  Ci  =  p\Cx)  and  hence 


p(Ao)ei(CJ.^^‘) 


(4.5) 


Now,  corresponding  to  the  rule  Ao*-A** , . . . ,  A®’',  €Cp  contains  constraints 
of  the  form  «“'•'+>  2  [Aq  €  Bo.^^,Ai  6  , A^'  € 

where  t'  ranges  over  l..r,  /?o  ranges  over  body  atom  labels  such  that  B^ 
is  a  body  atom  in  P  and  Ao  and  Bq  are  compatible,  and  the  t  >  1, 
range  over  rule  labels  such  that  Bf'  is  a  head  atom  in  P  and  A,-  and  Bi 
are  compatible.  Now  (4.5)  establishes  that  there  is  a  body  atom  B^  in  P 
such  that  p(Ao)  €  I{B^).  Combining  this  with  the  fact  that  J  satisfies  all 
constraints  in  £Cp  proves  that,  for  t'  =  l..r, 

ifp(Ai)e2:(5f‘),...,p(Ar0€l(5^'')  thenp6l('^“^+>)  (4.6) 
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where  body  atom  labels.  Note  that  the  condition  that  Ai  and 

Bi  are  compatible,  which  is  associated  with  the  construction  of  an  environ¬ 
ment  constraint,  is  subsumed  by  the  condition  p{Ai)  €  T(Bf')  and  hence 
does  not  appear  in  (4.6). 

Before  proving  (1)  and  (2),  we  first  establish  that  condition  (0)  holds  for 
This  is  necessary  because  the  assumption  that  the  hypothesis  holds 
for  V  in  the  case  where  n  =  n'  —  1  is  a  statement  of  the  form:  if  (0)  holds 
with  n  =  n'  —  1  then  (1)  and  (2)  hold  with  n  =  n'  —  1.  Hence,  to  make  use 
of  this  assumption,  we  shall  have  to  show  that  “(0)  holds  with  n  =  n'  —  1”. 
Specifically,  we  need  to  show  that 

all  atoms  in  that  are  solved  by  V  are  such  that 

A°‘  e  X(B.^'^)  for  some  head  atom  B"^  in  P. 

To  prove  this,  let  .4“  be  an  atom  in  Cj* , . . . , C'Jj'"  that  is  solved  in  G.  Now, 
if  4"  is  not  CJ'*  then  4“  appears  in  and  so  (0)  implies  that  4“  G 
1{B.^'^)  for  some  head  atom  B'*'  in  P.  On  the  other  hand,  if  4“  is  Cj*  ♦^hen 
/)(4“‘), . . .  ,/)(4“’‘)  must  be  solved  in  G.  Now,  the  assumption  (0)  (for  G„) 
implies  that,  for  i  =:  l..r,  /)(4f’)  €  I{Bi.9^')  for  some  head  atom  Bf*  in  P. 
Combining  this  with  (4.6)  proves  that  p  €  J('3?‘*’‘+‘)‘  Since  4  =  Ci  =  /)(4o), 
this  implies  that  4  €  I(4o. “’■+*)• 

Now,  consider  (1)  and  (2).  If  fc  <  n,  then  (1)  and  (2)  follow  from  the 
induction  hypothesis  and  the  fact  that  (0)  holds  for  (which  has  just 
been  proved).  Now  consider  the  case  where  k  =  n.  To  prove  (1),  suppose 
that  V  selects  4®  and  V  introduces  4“  at  step  k  using  p\  Since  fc  =  n,  it 
must  be  the  case  that  r  >  I,  p'  =  p  and  4®  is  p(4j^)  for  some  j  in  the  range 
l..r.  Since  V  follows  a  left-to-right  execution  strategy,  it  is  easy  to  verify 
that  T>  must  solve  p(4“*), . . .  ,p(4jij‘).  This  is  because  if  Gn  contains  any 
descendants  of  p(4®*), . . .  ,p(4jij‘)  then  these  descendants  would  appear  to 
the  left  of  p{A°^)  and  hence  p{A°^)  could  not  be  selected  from  Gn-  Since  V 
solves  4“‘ , . . . ,  4jij* ,  assumption  (0)  can  be  applied  to  prove  that  for  all  t, 

1  <  i  <  j,  there  exists  a  head  atom  Bf*  in  P  such  that  p(4“*)  €  I(B,-.5'^‘). 
Hence  (4.6)  implies  that  p  €  I(4'®>). 

To  prove  (2),  suppose  that  V  uses  rule  R°  and  environment  />'  at  step  k 
and  the  atoms  introduced  by  step  k  are  solved  in  the  subsequent  steps  of  V. 
Since  k  =  n,iX  must  be  the  case  that  a  is  ar^-i,  p  =  p'  and  the  subsequent 
steps  of  V  solve  p(4j‘),...,p(4®’’).  FVom  the  assumption  that  (0)  holds 
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for  G'„,  it  follows  that  for  each  t,  i  =  l..r,  p(Ai)  G  for  some  head 

atom  J5f’  in  P.  Hence  (4.6)  implies  that  p  €  ).  [] 

Proof  (for  interleaving  semantics):  The  proof  for  the  top-down  inter¬ 
leaving  semantics  is  very  similar  to  that  for  the  top-down  left-to-right  case. 
The  induction  hypothesis  used  is  identical  and  the  base  and  inductive  cases 
for  (2)  are  also  the  same.  The  only  difference  is  in  the  base  and  inductive 
cases  for  (1)  where  the  proof  is  in  fact  simpler  than  in  the  left-to-right  case. 

Recall  that  in  the  base  case  n  =  0  and  G'^  is  an  instance  of  the  initial 
goal  Go-  Let  Gq  have  the  form  , . . . ,  j4“''  and  let  the  label  of  Go  be  a^+i . 
Then  G'^  has  the  form  p(Af‘ ),..., p(i4®’’)  where  p  is  some  environment. 
Assume  that  (0)  holds,  and  consider  (1).  Since  fc  <  0,  the  only  possible 
value  for  k  is  0.  Suppose  that  V  selects  A°  and  V  introduces  at  step  0 
using  p'.  This  means  that  A°‘  must  appear  in  Gq,  and  so  there  exists  an  i 
such  that  A®  is  p(Af').  Now,  the  environment  constraints  corresponding  to 
Qo  include  the  constraint  D  [  ].  and  it  follows  that  p  €  I('®'“*). 

The  proof  of  (2)  when  n  =  0  given  previously  for  the  left-to-right  seman¬ 
tics  is  in  fact  independent  of  the  selection  strategy  at  hand.  Hence  it  can  be 
used  here  without  modification;  we  omit  the  repetition.  This  completes  the 
base  case. 


Now  suppose  that  for  some  n'  the  hypothesis  holds  for  n  =  n'  -  1,  and 
we  seek  to  prove  that  the  hypothesis  when  n  is  n\  Assume  that  (0)  holds. 
Let  the  step  from  Gn-i  to  Gn  be  of  the  form 


where  the  rule  in  P  with  label  Qr+i  is  Ao<— A® * , . . . ,  A®*"  such  that  p(Ao)  = 
Cl.  Now,  consider  the  derivation  V  consisting  of  the  first  n  —  1  steps  of 
V.  Clearly  V'  is  a  derivation  from  Gq  to  GJ,_i  that  selects  C^'.  Now,  the 
hypothesis  is  assumed  to  hold  forn  =  n'  —  1,  and  when  applied  to  derivation 
D',  condition  (0)  is  vacuous  and  so  (1)  implies  that  p'  £  where  p'  is 

the  environment  used  to  introduce  Cj' .  Let  G^  be  the  body  atom  in  P  with 
label  7,.  Then  p(Ao)  =  Ci  =  p'(Gi)  and  hence  p(Ao)  6  J(CI.’9''').  Now, 
corresponding  to  the  rule  Ao<— Af’ , . . . ,  A®*’,  SCp  contains  constraints  of  the 
following  two  forms: 
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2  [Ao  e  Bq.9^] 

iffOr+i  2  Ao£BQ.Sb^,AieBi.'9^\...,AreBr.9‘^^] 

where  j  ranges  over  l..r,  fio  ranges  over  body  atom  labels  such  that  B^  is  a 
body  atom  in  P  and  Aq  and  Bq  are  compatible,  and  the  t  >  1,  range  over 
rule  labels  such  that  Bf'  is  a  head  atom  in  P  and  Ai  and  Bi  are  compatible. 
Now  we  have  just  established  that  there  is  a  body  atom  B^  in  P  such  that 
p(Ao)  €  X(B^).  Combining  this  with  the  fact  that  I  satisfies  all  constraints 
in  £Cp  proves  that, 

(a) />€!(«'“>),  i  =  l..r 

(h)i{piAi)eI{B^^)....,p{Ar)eI(B?r)  thenpeI(^'“^+>)  ^  ^ 

where  0i,...,/3r  ^re  body  atom  labels. 

As  in  the  left-to-right  case,  we  begin  by  proving  that  condition  (0)  holds 
for  That  is,  we  prove  that  all  atoms  in  Gn'-i  that  are  solved  by 

V  are  such  that  A*  6  for  some  head  atom  B"^  in  P.  To  prove  this, 

let  A“  be  an  atom  in  Cj* , . . .  ,C^"  that  is  solved  in  G.  Now,  if  A“  is  not  C^' 
then  A°‘  appears  in  and  so  (0)  implies  that  A“  6  2{B.^'^)  for  some  head 
atom  B^  in  P.  On  the  other  hand,  if  A“  is  C7‘  then  p( Af *),..., p( A*') 
must  be  solved  in  G.  Now,  the  assumption  (0)  (for  Gn)  implies  that,  for 
j  ~  l..r,  p{Aj’)  €  2{Bj.^^i)  for  some  head  atom  B^^  in  P.  Combining  this 
with  part  (b)  of  (4.7)  proves  that  p  €  !(♦“’■+*).  Since  A  =  Ci  =  p(Ao),  this 
implies  that  A  6  I(Ao-^“'‘'*’^)* 

Now,  consider  (1)  and  (2).  If  Jfc  <  n,  then  (1)  and  (2)  follow  from  the 
induction  hypothesis  and  the  fact  that  (0)  holds  for  (which  has  just 
been  proved).  Now  consider  the  case  where  fc  =  n.  To  prove  (1),  suppose 
that  V  selects  A®  and  V  introduces  A®  at  step  k  using  p'.  Since  fc  =  n,  it 
must  be  the  case  that  r  >  1,  p'  =  p  and  A®  is  p{Aj’)  for  some  j  in  the  range 
l..r.  That  p  €  1(4'®^)  follows  immediately  from  part  (a)  of  (4.7). 

Again,  the  proof  for  the  inductive  case  of  (2)  that  was  given  for  the  left- 
to-right  semantics  is  independent  of  the  selection  strategy  at  hand,  and  can 
be  used  here  without  modification.  This  completes  the  inductive  case.  [] 
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Theorem  4  (Correctness  of  Top-Down  Constraints) 

For  all  programs  P  and  all  labels  a  in  P, 

lm(£Cp){^°)  =var{a)  {p  •  P  ^  and  E  €  C5p(a)}. 

Proof:  Let  a  be  a  label  in  P.  From  Lemma  4  and  Lemma  5  it  fol¬ 
lows  that  lm(£Cp)(^°‘)  is  equal  to  By  definition,  is 

just  QCSp{a).  Finally,  Lemma  3  proves  that  GCSp{a)  =„or(o)  {p  •  P  N 
E  and  E  €  C5p(a)},  and  this  completes  the  proof.  [] 


4.6.2  Bottom-up  Semantics 

The  proof  of  correctness  for  the  bottom-up  semantics  is  similar  to  that  for  the 
top-down  semantics,  although  it  is  substantially  simpler.  Again  we  define 
a  ground  notion  of  derivability  corresponding  to  the  earlier  definition  of 
bottom-up  derivability.  Specifically,  a  ground  atom  <—A  is  ground  bottom- 
up  derivable  if  there  is  a  ground  instance  A«-Ai,...,An  of  a  rule  in  P 
such  that  the  ground  atoms  Ai, . . . ,  An  are  bottom-up  derivable.  As  before, 
ground  bottom-up  derivability  is  closely  linked  to  bottom-up  derivability. 


Proposition  10  If  {E  :  A)  is  bottom-up  derivable  and  p\=  E  then  p{A)  is 
ground  bottom-up  derivable. 


Proof;  The  proof  proceeds  by  induction  on  the  definition  of  bottom-up 
derivable.  Let  {Ei :  Bi),  i  =  l..n,  be  bottom-up  derivable  states  that  satisfy 
the  proposition,  let  A*- Aj, . .. ,A„  be  a  rule  in  P,  and  let  0i,  ...,  be 
renaming  substitutions  such  that  t;or(A,Ai,...,An),  vor(Pi(f;i),Pi(Bi)), 
...,  var(0n(En)i6n{Bn))  all  disjoint  sets.  Under  these  assumptions, 
we  need  to  show  that  (E :  A)  satisfies  the  proposition,  where  E  is  (Aj  = 
A  0i(Ei)  A  •  •  •  A  (An  =  0n(Bn))  A  0n(En).  To  this  end,  assume  that 
p  ^  E.  This  implies  that  p  is  a  model  for  each  and  it  follows  that  p(Bi) 
is  ground  bottom-up  derivable,  i  =  l..n,  because  each  (Ei :  Bi)  is  assumed 
to  satisfy  the  proposition,  p  \=  E  also  implies  that  p(A,-)  is  identical  to 
p(0i(Bi)).  Hence  p(A)*—p(Ai), . . .  ,p(An)  is  a  ground  instance  of  a  rule  in 
P,  and  it  follows  that  p(A)  is  ground  bottom-up  derivable.  [] 
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Proposition  11  If  A  is  ground  bottom-up  derivable  then  there  exists  a 
bottom-up  derivable  state  {E :  B)  and  an  environment  p  such  that  p  ^  E 
and  p{B)  is  A. 


Proof:  The  proof  proceeds  by  induction  on  the  definition  of  ground  bottom- 
up  derivable.  Let  Ai,...,An  be  ground  bottom-up  derivable  states  that 
satisfy  the  proposition,  let  A*-Aif...iAn  be  a  ground  instance  of  a  rule  in 
P,  and  we  seek  to  show  that  the  proposition  holds  for  A.  Since  Ai,...,An 
satisfy  the  proposition,  it  must  be  the  case  that  for  each  t,  t  =  l..n,  there 
exists  a  bottom-up  derivable  state  {E, :  B[)  and  an  environment  p,-  such 
that  Pi  1=  E  and  Pi(Bi)  is  Ai.  Also,  since  A*-Ai,. . .  ,An  is  a  ground  in¬ 
stance  of  a  rule  in  P,  there  exists  a  rule  in  P  of  the  form  B*—B\,.  ..,Bn 
and  an  environment  pR  such  that  A  =  Pr{B)  and  A,-  =  pji(P,),  t  =  l..n. 
Let  6i,  ...,  6n  be  renaming  substitutions  such  that  var(B,Bi,...,Bn), 
var(Bi(Ei),0i(B[)),  ...,  var(6n(En),Bn(Sn))  are  disjoint  sets,  and  define 
E  to  be  (fli  =  di(Pj))  A  0i(Ei)  A  •  •  •  A  (P„  =  A  Define  an 

environment  p  as  follows: 

ifX€uar(<?.(P<).W-)),  l<i<n 

^  \  PniX)  otherwise 

By  definition,  p{0i{X))  —  pi{d^^(0i{X)))  =  Pi{X)  for  each  variable  X  ap¬ 
pearing  in  Ei,  and  hence  p  |=  Bi^Ef)  iff  p,-  ^  P,-.  It  has  already  been 
established  that  p,-  ^  Ei  and  so  p  f=  Bi(Ei),  i  =  l..n.  Similarly,  the  defini¬ 
tion  of  p  implies  that  p(Bi(Bi))  =  Pi(Bi),  i  =  l..n.  Combining  this  equality 
with  previously  established  equalities  proves  that 

p(Bi(BI))  =  pi(BI)  =  Ai  =  Pr(P,)  =  p(P<).  *  = 

and  it  follows  that  p  satisfies  each  equation  Bi  =  Bi{B'i),  and  hence  p  ^  P. 
Moreover,  p(P)  =  Pr{B)  =  A.  Hence  (P :  P)  is  a  bottom-up  derivable  state 
and  p  is  an  environment  such  that  p  |=  P  and  p{B)  is  A.  Q 
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Using  ground  bottom-up  derivable,  a  ground  collecting  semantics  can 
now  be  defined. 

Definition  9  The  ground  bottom-up  collecting  semantics  of  a  logic  program 
P  is  the  mapping  QCSp  such  that,  for  each  rule  label  a, 

GCSp(q)  ^  ip :  *”  ^  ^  A*-Ai ,...,An  and 

\  •  •  •  5P(-An)  ore  ground  bottom-up  derivable 

D 

Using  the  previous  two  propositions,  which  relate  ground  derivations  with 
derivations,  the  following  correspondence  can  be  established. 

Lemma  6  For  all  rule  labels  a, 

GCSp(a)  =„ar(a)  {p ’•  p  ^  E  for  some  E  €  C5p(o)}. 

Proof:  The  first  part  of  the  proof  shows  that  for  any  environment  p  in  the 
set  on  the  left  hand  side  of  this  equation,  there  exists  an  environment  p'  in  the 
right  hand  side  such  that  p  p\  Suppose  that  p  6  GCSp{a)  and  let  the 

rule  with  label  a  be  A*-A\,. . . ,  A^.  This  implies  that  p{A\), . . .  ,p(i4„)  are 
ground  bottom-up  derivable.  Hence,  by  proposition  11,  there  exist  bottom- 
up  derivable  states  (£?,• :  Bi)  and  environments  pi  such  that  />,•  ^  £,•  and 
Pi{Bi)  is  p{Ai),  i  =  l..n.  Now,  let  tfj,  ...,  be  renaming  substitutions 

such  that  var(A,Au..-,A„),  var(ei(Ei),ffi(Bi)) . var(0n(En)MBn)) 

are  all  disjoint  sets.  By  definition,  {E  :A)  is  bottom-up  derivable  where  E 
is  (^1  =  6i(B\))  A  A  •  •  •  A  (i4n  =  ffn(Bn))  A  9n{En)’  Now,  define  an 

environment  p'  as  follows: 

p'iX)  =  {  if  X  €  var{9i{Ei)A{B'd),  1  <  *  <  n 

p{X)  otherwise 

Using  reasoning  similar  to  that  used  in  Proposition  11,  it  is  easy  to  verify 
that  p'  is  a  model  of  each  «<(£,)  and  that  p>{Ai)  =  i  =  l..n.  This 

implies  that  p'  |=  £,  and  so  />'€{/>:  p  £  for  some  E  €  C5p(a)>.  Since 
P'  =vor(a)  P,  the  proof  for  this  direction  is  complete. 

Conversely,  suppose  that  p  h  such  that  E  €  CSp{a).  If  the  rule 
in  P  with  label  a  is  A*-A\,...,Any  then  by  definition  of  C5p,  there  must 
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exist  bottom-up  derivable  states  (E{ :  Bi)  and  renamings  Si,  1  <  t  <  n,  such 
that  var{0n{En),6n(Bn))  are 

disjoint  sets,  and  such  that  E  is  (Ai  =  tfi(5i))A0i(£?i)A'  •  •A(y4„  =  0n(Bn))^ 
^n(^n)-  Since  E,it  follows  that  p  \=  0i(Ei),  i  =  l..n.  This  implies  that 
po0i  ^  Ei.  From  Proposition  10  it  follows  that  p  o  0i(Bi),  ...,po  ^n(-Bn) 
are  all  bottom-up  derivable.  Also,  p  ^  E  implies  that  p(A,-)  =  p  o  0i(Bi), 
and  hence  p(Ai),...,p(An)  are  bottom-up  derivable.  Thus  p  €  0CSp(q), 
and  the  lemma  is  proved.  [] 

Note  that  the  set  of  ground  derivable  goals  corresponds  to  the  usual 
bottom-up  semantics  of  logic  programs  [7]  defined  using  the  Tp  function  (see 
section  4.2  on  page  63).  Specifically,  ^  A  is  ground  bottom-up  derivable 
iff  A  €  lfp(Tp).  The  above  theorem  can  now  be  used  to  show  that  the 
bottom-up  semantics  defined  in  Section  4.2  is  equivalent  to  the  standard  Tp 
bottom-up  semantics  in  the  following  sense: 

lfp(Tp)  =  {/»(A)  :{E:A)  is  bottom-up  derivable  and  p^  E} 

We  now  complete  the  proof  of  the  correctness  of  the  bottom-up  seman¬ 
tics.  Recall  that  the  proof  so  far  has  connected  bottom-up  collecting  se¬ 
mantics  with  bottom-up  ground  collecting  semantics.  The  remaining  part 
of  the  proof  connects  the  ground  collecting  semantics  with  the  environment 
constraints,  and  consists  of  two  lemmas.  Again  Jgct  denotes  the  mapping 
such  that  2jc»(®“)  =  ffCSp(a).  We  begin  with  the  following  easy  property. 


Proposition  12  I/Ag  is  a  head  atom  in  P  then  each  atom 
is  ground  bottom-up  derivable. 


Proof:  Let  Ao^Ai, . . . ,  A„  be  the  rule  in  P  with  label  a  and  suppose  that 
A  €  IgcsiAo.^^).  This  implies  that  there  is  an  environment  p  such  that  p  e 
Tgesi^°)  and  A  =  p(Ao).  Now,  p  €  Xge»i^°)  implies  that  />(Ai),. . . ,p(A„) 
are  ground  bottom-up  derivable.  It  immediately  follows  that  p(Ao)  is  ground 
bottom-up  derivable,  and  since  A  =  ^(Ao))  this  completes  the  proof  that  A 
is  ground  bottom-up  derivable.  [] 

Lemma  7  Xget  is  a  model  of  £Cp. 


Proof:  Each  constraint  in  €Cp  is  of  the  form 
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2  [Ai  €  e 

such  that  the  rule  in  P  with  label  a  is  Ao*~Ai,. . . ,  i4n  and  each  /?i  is  a  rule 
label  such  that  Bf'  is  a  head  atom  in  P  and  B,  and  Ai  are  compatible.  Now, 
suppose  that  p  e  Xgcs{[A\  €  B\.^^^,...,An  €  Bn.9^’^].  Hence  p(Ai)  € 
Xgcs(Bi.9^'),  i  =  l..n.  Proposition  12  implies  that  each  p(Ai)  is  bottom-up 
derivable  and  it  is  immediate  that  p  €  Igcs(oc).  [] 

Lemma  8  If  X  is  a  model  of  SC p  then  Xgca  Q  X. 

Proof:  The  proof  uses  induction  on  the  definition  of  ground  bottom-up 
derivability.  To  establish  the  basis  for  this  induction,  we  first  refine  the 
definition  of  bottom-up  derivability.  Specifically,  define  that  p{A)  is  ground 
bottom- up  derivable  with  index  k  if  there  is  an  environment  p  and  a  rule 
in  P  of  the  form  A^Ai, . . .  ,i4n  such  that  each  p{Ai)  is  ground  bottom- up 
derivable  with  index  ki  and  k  =  1  -f  -(-••>-{-  ibn.  Clearly  A  is  ground 
bottom-up  derivable  iff  there  exists  a  A;  >  1  such  that  A  is  ground  bottom- 
up  derivable  with  index  k.  The  lemma  is  now  established  by  proving  the 
following  hypothesis: 

For  all  rule  labels  a  and  environments  p,  if  the  rule  in  P 
with  label  a  is  A*-A\,...,An  and  each  p{Ai)  is  bottom-up 
derivable  with  index  ki  <  k,  then  p  € 

It  is  easy  to  verify,  using  the  definition  of  QCSp,  that  if  this  hypothesis 
holds  for  all  k  then  Xgea  C  X.  The  proof  of  the  hypothesis  proceeds  by 
induction  on  k.  Suppose  that  the  hypothesis  holds  for  all  k  less  than  V 
where  kf  is  some  non-negative  integer,  and  we  seek  to  show  the  hypothesis 
when  k  =  k'.  Let  a  be  a  rule  label,  let  A*-A\,...,An  be  the  rule  in  P 
with  label  a  and  let  p  be  an  environment  such  that  each  p(Ai)  is  bottom-up 
derivable  with  index  k,-  <  k.  This  means  that  for  each  i  there  is  a  rule  label 
/3  and  an  environment  p'  such  that  the  B*-Bi,...Br  is  the  rule  in  P  with 
label  0,  p'{B)  =  p{Ai)  and,  for  each  j,  j  =  l..n,  p\Bj)  is  ground  bottom-up 
derivable  with  index  Ij  <  ki.  Since  the  hypothesis  is  assumed  to  hold  for 
Jb,-,  it  follows  that  p'  €  X(’9^)  and  so  p{Ai)  €  X{Bj9^).  In  summary  then, 
for  i  =  l..n  there  exists  a  hesid  atom  C-*  in  P  such  that  p{Ai)  €  X{Ci.'9'^'). 
Now,  by  construction,  SCp  contains  the  constraint 
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2  [At  €Ci.9'^,...,An€ 

Since  I  is  assumed  to  be  a  model  of  €Cp,  it  immediately  follows  that  p  € 

D 

Theorem  5  (Correctness  of  Bottom-Up  Constraints) 

For  all  programs  P  and  all  rule  labels  a  in  P, 

lm(€Cp)(9°)  =„ar(a)  {p  ‘  P  ^  E  and  E  €  C5p(o()}. 


Proof:  This  theorem  follows  immediately  from  lemmas  6,  7  and  8.  [] 
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Chapter  5 

Set  Based  Approximation 


Environment  constraints  provide  a  flexible  framework  for  reasoning  about 
programs.  In  particular,  by  re-interpreting  the  basic  constructs  of  the  con¬ 
straints,  we  can  deflne  approximations  of  a  program’s  collecting  semantics. 
Such  an  approach  is  used  in  this  chapter  to  deflne  the  set  based  approxima¬ 
tion  of  a  program.  We  begin  by  developing  interpretat'ons  of  the  environ¬ 
ment  constraints  that  do  not  contain  inter-variable  dependencies.  Central 
to  these  interpretations  is  the  treatment  of  program  variables  as  sets,  and 
this  is  formalized  using  set  environments,  which  are  mappings  from  pro¬ 
gram  variables  into  sets.  The  resulting  interpretations  are  called  set  based 
interpretations.  The  set  based  approximation  of  a  program  is  deflned  to  be 
the  smallest  set  based  interpretation  that  is  a  model  of  the  program’s  envi¬ 
ronment  constraints.  That  is,  the  smallest  (standard)  interpretation  of  the 
environment  constraints  gives  the  collecting  semantics  of  the  program,  and 
the  smallest  set  based  interpretation  of  the  environment  constraints  gives 
the  set  based  approximation  of  (the  collecting  semantics  of)  the  program. 

A  key  part  of  this  chapter  involves  formalizing  inter- variable  dependen¬ 
cies.  While  this  notion  is  fairly  clear  at  an  intuitive  level,  it  has  no  a  priori 
deflnition.  In  fact  there  are  a  number  of  potential  deflnitions,  and  these  are 
outlined  during  the  development  of  set  based  interpretations.  Importantly, 
there  is  one  definition  that  is  more  natural  tham  the  others,  aind  this  deflni¬ 
tion  is  used  as  the  basis  for  set  based  program  approximation.  This  provides 
a  key  justification  of  our  claim  that  set  based  approximation  makes  exactly 
one  approximation:  all  inter-variable  dependencies  are  ignored. 
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5.1  Summary  of  Environment  Constraints 


We  begin  by  summarizing  the  form  of  the  environment  constraints  used  to 
characterize  the  collecting  semantics  of  programs.  We  first  consolidate  some 
of  the  basic  definitions  for  logic  and  imperative  programs.  Let  S  denote  a 
common  set  of  function  symbols  that  subsumes  the  function  symbols  used  in 
imperative  programs  as  well  as  the  function  and  predicate  symbols  used  in 
logic  programs.  Let  var  denote  a  common  set  of  program  variables  for  both 
logic  and  imperative  programs.  Note  that  var  is  infinite.  A  program  term 
is  either  a  program  variable  or  of  the  form  /(ti,...,tn)  or  fZ\(ti)  where 
each  ti  is  a  program  term,  /  €  I!  and  1  <  i  <  n.  Note  that  the  definition  of 
program  term  encompasses  both  logic  and  imperative  program  terms  as  well 
as  logic  program  atoms.  A  value  is  a  program  term  constructed  only  from 
symbols  in  S.  An  environment  is  a  mapping  from  VAR  into  values.  Again 
we  shall  write  to  denote  an  environment  that  maps 

Xi  into  Vi,  i  =  l..n.  We  shall  frequently  abuse  this  notation  in  examples 
and  write  expressions  such  as  denoting 

the  infinite  set  of  all  environments  that  either  map  X  to  vy  and  Y  to  t>2  or 
else  map  X  to  vz  and  Y  to  v^. 

An  atomic  program  condition  is  of  the  form  s  =  t,  ->(5  =  t),  match}{t) 
or  ~>{matchf(t))  where  A  is  a  program  variable,  /  is  a  function  symbol  from 
S  and  s  and  t  are  program  terms.  A  program  condition  is  a  disjunction 
of  conjunctions  of  atomic  program  conditions.  Note  that  this  definition  of 
program  condition  is,  strictly  speaking,  a  restriction  of  the  previous  defini¬ 
tion  since  it  only  allows  "disjunction  normal  form”  program  conditions  to 
be  written.  This  is  done  for  convenience.  Moreover,  it  is  an  inconsequen¬ 
tial  restriction  since  any  program  condition  can  be  easily  rewritten  into  an 
equivalent  disjunction  of  conjunctions  of  basic  program  conditions. 

An  environment  variable  is  a  variable  that  ranges  over  sets  of  environ¬ 
ments,  and  is  denoted  by  the  symbol  9.  For  each  program  point  p,  there  is  a 
distinguished  environment  variable  denoted  9'*,  whose  purpose  is  to  describe 
the  environments  corresponding  to  point  p.  An  environment  expression  is 
either 


•  an  environment  variable 

•  the  constant  T ; 
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•  •-»<],  where  X  is  a  program  variable  and  f  is  a  program  term; 

•  ’^[cond],  where  cond  is  a  program  condition; 

•  [Ai  €  Bi.9i,...,An  €  where  the  A,-  and  Bi  are  program 

terms. 

An  environment  constraint  is  of  the  form  9  D  ee  where  ee  is  an  environment 
expression. 

We  now  review  the  meaning  of  environment  constraints.  First,  the  mean¬ 
ing  of  program  terms  is  defined.  Given  an  environment  p,  the  meaning  p{t) 
of  a  term  t  is  defined  by  extending  p  as  follows: 

•  P(/(fl>  •  •  •  ifn))  =  /(P(^l)»  •  •  •  »P(fn))- 

•  P(/(7)^(*0)  =  Vf  if  p(*')  =  /(»!>•  •  •  *  «n)  for  some  values  vi, . . . ,  t>n. 

Note  that  p(/(7)^(<0)  is  undefined  if  p(t')  is  not  of  the  form  /(vi, . . . ,  »„)•  We 
write  p  >t  if  t  is  defined  under  p.  Next,  the  meaning  of  program  conditions 
is  defined.  As  mentioned  previously,  each  program  condition  is  written  as  a 
disjunction  of  conjunctions  of  atomic  program  conditions.  We  write  p  t>  cond 
if  p  >  t  for  each  program  term  t  appearing  in  cond.  Now,  where  p  is  an 
environment  such  that  p  >  cond,  the  relation  p  ^  cond  is  defined  as  follows: 

•  p  f=  s  =  t  iff  p(s)  =  pit)’, 

•  p  f=  -■(«  =  «)  iff  p(s)  #  p(*); 

•  p\=  match  fit)  iff  /»(<)  is  of  the  form  /(ui, . . . ,  v„); 

•  p^  ->imatch  fit))  iff  pit)  is  of  the  form  giv\, . . . ,  Un)  where  f  ^  g; 

•  p\=  cond\  A  condi  iff  p  H  cond\  and  p  ^  cond2 

•  p\=  cond\  V  cond2  iff  either  p  ^  condi  or  p  [=  cond2. 

If  it  is  not  the  case  that  p  >  cond  then  p  ^  cond  is  not  defined.  Finally,  an 
interpretation  of  environment  constraints  is  a  mapping  from  each  environ¬ 
ment  variable  into  a  set  of  environments.  Given  such  an  interpretation  X, 
the  meaning  of  an  environment  expression  is  defined  as  foUows: 
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•  I('P)  is  already  defined. 

•  I(T)  =  {all  environments}. 

•  :  /)  €  0}  where  0  is  {p  €  T{^) :  p  t>  <}. 

•  Z{l![cond])  —  {p  €  0  :  p  1=  cond}  where  0  is  {p  €  :  p  >  cond}. 

•  I{[Ai  e  €  B„.<P„])  =  {p  :  p(^)  €  1(5..’^.),  i  =  l-n}. 

The  expression  2{B.'9)  denotes  the  set  of  ground  atoms  {p(B)  :  p  € 

An  interpretation  is  a  model  of  a  conjunction  of  environment  constraints  SC 
if  3  T{ee)  for  each  constraint  D  cc  in  SC. 

5.2  Inter- Variable  Dependencies  in  SCp 


What  we  ultimately  desire  is  a  simple,  intuitive  and  decidable  definition 
of  an  approximation  to  a  program’s  (collecting)  semantics.  One  natural 
way  to  obtain  such  an  approximation  is  by  developing  a  notion  of  approxi¬ 
mate  interpretation  of  the  environment  constraints.  Given  such  a  notion,  an 
approximate  semantics  can  be  defined  using  the  smallest  approximate  inter¬ 
pretation  that  is  a  model  of  the  constraints.  That  is,  the  least  (standard) 
model  of  the  environment  constraints  defines  the  program’s  exact  seman¬ 
tics,  and  the  least  approximate  model  defines  the  program’s  approximate 
semantics. 

In  essence,  this  is  the  approach  used  to  define  the  set  based  approxima¬ 
tion  of  a  program.  Specifically,  to  obtain  a  notion  of  program  approximation 
that  ignores  inter- variable  dependencies,  we  develop  an  interpretation  of  the 
environment  constraints  that  is  free  of  such  dependencies.  Now,  recall  that 
environment  constraints  are  interpreted  by  starting  with  a  mapping  I  from 
environment  variables  into  arbitrary  sets  of  environments,  and  then  extend¬ 
ing  this  mapping  in  an  obvious  way  to  map  environment  expressions  into 
sets  of  environments.  Inter-variable  dependencies  arise  in  two  places  in  this 
interpretation.  First,  they  may  be  present  in  the  collections  of  environments 
specified  by  X.  Second,  they  may  be  present  in  the  process  of  extending  X  to 
map  from  environment  expressions  into  sets  of  environments.  We  consider 
these  possibilities  in  turn. 
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Inter- Variable  Dependencies  in  Sets  of  Environments 

We  begin  our  development  of  an  interpretation  of  the  environment  con¬ 
straints  that  ignores  inter-variable  dependencies,  by  considering  the  inter¬ 
variable  dependencies  that  may  be  present  in  collections  of  environments. 
For  example,  suppose  that  one  of  the  collections  of  environments  specified 
by  an  interpretation  is  the  following  collection  of  environments 

{[A-  a,y  bl[X  c,y  d]}, 

where  it  is  assumed  that  the  only  program  variables  of  interest  are  X  and  Y. 
Inter- variable  dependencies  are  present  in  this  collection  of  environments  in 
the  sense  that  whenever  X  takes  the  value  a,  then  it  must  be  the  case  that 
y  takes  the  value  6,  and  whenever  X  takes  c,  Y  must  take  d. 

Intuitively,  a  collection  of  environments  is  free  of  inter-variable  depen¬ 
dencies  if  fixing  the  value  of  one  or  more  variable  does  not  affect  the  values 
that  other  variables  may  take.  More  concretely,  let  G  be  a  collection  of  envi¬ 
ronments  and  suppose  that  we  choose  an  environment  p  from  0  and  modify 
p  so  that  it  maps  X  into  v  where  v  is  one  of  the  “possible  values”  for  JV.  If  0 
is  free  of  inter- variable  dependencies,  then  we  expect  that  the  modified  envi¬ 
ronment  should  also  be  an  element  of  0.  For  example,  consider  once 

again  the  collection  of  environments  {[A’i-»o,yi-»6],[A'i-^c,y>-^d]};  denote 
this  collection  by  0.  On  choosing  [J?^»-^o,y»->6]  from  0  and  modifying  this 
environment  to  map  Y  into  d  (one  of  the  “possible  values”  for  y),  we  obtain 
the  environment  {X*-^a,Y>-*d\  and  this  is  not  contained  in  0.  Hence,  as 
expected,  0  is  not  free  of  inter-variable  dependencies.  However,  a  collecting 
of  environments  that  is  free  of  inter-variable  dependencies  can  be  obtained 
by  augmenting  0  with  the  environments  [Xt-^a,Yi-*d\,  [Xi-^c,Y»-*b]. 

More  formally,  let  0  be  a  collection  of  environments  and  define,  in  the 
context  of  0,  that  v  is  a  possible  value  for  X  if  there  exists  an  environment 
p  in  Q  such  that  p(X)  =  v.  Now,  consider  the  environments  p  such  that 
there  exists  an  environment  p  in  0  and  for  each  X  either  (a)  p  agrees  with 
p  on  X  or  else  (b)  p(A’)  is  a  possible  value  for  X.  Define  that  0  is  free 
of  inter-variable  dependencies  if  0  contains  all  such  environments  p.  This 
provides  a  fairly  direct  formalization  of  the  above  intuitions  about  inter¬ 
variable  dependencies.  However,  observe  that  case  (a)  of  the  construction  of 
p  is  redundant  because  if  p(A’)  =  p{X)  then  p{X)  is  a  possible  value  for  X. 
Hence,  0  is  free  of  inter- variable  dependencies  if  0  contains  all  environments 
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p  such  that,  for  each  X,  p{X)  is  a  possible  value  for  X.  More  compactly,  0 
is  free  of  inter- variable  dependencies  if 

p  £  Q  iff  for  all  X,  p(X)  is  a  possible  value  for  X. 

If  a  collection  @  satisfies  this  condition,  then  6  can  be  completely  described 
by  the  sets  of  possible  values  that  it  defines  for  each  program  variable.  In 
other  words,  such  a  collection  6  can  be  viewed  as  a  specification  of  a  set 
of  values  for  each  program  variable.  Hence,  collections  6  that  are  free  of 
inter-variable  dependencies  can  be  characterized  as  set  based  collections  of 
environments. 

Definition  10  (Set  Based  Environment  Collections)  A  collection  0 
of  environments  is  set  based  if  there  exists  a  mapping  F  from  program  vari¬ 
ables  into  sets  of  values  such  that 

p  £  Q  iff  p{X)  £  F(X)  for  all  program  variables  X.  [] 

For  example,  {[Ar»-»a,y»-»6],  [X*-*a,Y^-*d[,  [A’H-*c,y ►-*•<£]}  is 

set  based  because  it  can  be  represented  by  the  function  F  that  maps  X  into 
{o,c}  and  Y  into  {6,d}. 

The  mapping  F  in  Definition  10  can  be  thought  of  as  a  representation  of 
the  0.  In  most  cases,  there  is  only  one  set  mapping  F  that  represents  a  set 
based  collection  0.  However,  in  the  boundary  case  where  0  is  the  empty 
set,  there  are  many  possible  choices  for  F  since  any  F  that  maps  at  least 
one  program  variable  into  the  empty  set  is  a  candidate.  It  is  convenient  to 
refine  the  notion  of  set  mapping  to  obtain  a  unique  representation  of  set 
based  collections  of  environments  as  follows. 

Definition  11  (Set  Environments)  A  set  environment  g  is  a  mapping 
from  program  variables  into  sets  of  values  such  that  if  g  maps  some  program 
variable  into  the  empty  set  then  it  maps  all  program  variables  into  the  empty 
set. 

Clearly  a  collection  0  of  environments  is  set  based  if  there  exists  a  set 
environment  g  such  that  p  €  0  iff  p{X)  £  g{X)  for  all  X.  Moreover,  there  is 
a  one-to-one  correspondence  between  set  based  collections  of  environments 
and  set  environments.  It  is  convenient  to  identify  a  set  environment  with 
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the  collection  of  environments  it  represents.  To  this  end,  we  shall  frequently 
treat  a  set  environment  p  as  a  set  of  environments  and  write  p  €  p  to  denote 
that  p{X)  €  pCJ?")  for  each  program  variable  X. 

Now,  as  a  first  step  towards  a  definition  of  program  approximation  that 
ignores  inter-variable  dependencies,  we  define  a  notion  of  program  approx¬ 
imation  based  on  set  environments.  Specifically,  where  P  is  a  program,  let 
approx  p  denote  the  least  interpretation  that  is  both  a  model  of  €Cp  and 
maps  each  environment  variable  into  a  set  based  collection  of  environments. 
Since  approxp  is  a  model  of  £Cp,  it  follows  that  it  must  be  larger  than 
lm{£Cp).  It  follows  that  lm(£Cp)  C  approx p  and  so  approx p  is  a  safe 
approximation  of  the  collecting  semantics  of  P. 

The  definition  of  approx  p  leads  to  a  very  simple  definition  of  program 
approximation;  however  it  is  not  decidable.  The  reason  is  that  although 
approxp  removes  inter-variable  dependencies  in  the  collections  of  environ¬ 
ments,  it  does  not  remove  dependencies  that  may  be  introduced  through  the 
action  of  program  variables.  For  example,  consider  the  imperative  program 
consisting  of  the  single  statement  X  :=  pair(X,X).  The  environment  con¬ 
straints  corresponding  to  this  program  are  2  and 

2  T.  For  this  program,  lm{SCp)  and  approXp  coincide  and  both  map 
into  the  set  based  collection  of  environments 

[Xt-»{patr(t7,v) :  w  is  a  value}]. 

Hence  dependencies  may  be  introduced  by  variables  even  though  the  collec¬ 
tion  of  environments  X{^)  does  not  contain  inter- variable  dependencies.  It 
is  these  kinds  of  dependencies  that  lead  to  the  undecidability  of  approx p. 


Dependencies  Introduced  By  Program  Variables 

To  describe  the  kinds  of  dependencies  that  may  be  introduced  through  the 
action  of  program  variables,  we  must  first  consider  dependencies  in  sets 
of  values.  A  set  of  values  contsuns  dependencies  if  there  are  relationships 
between  the  components  of  each  value.  For  example,  consider  the  two  sets 
of  terms  {/(a,6),/(c,d)}  and  {f{g''{a),g^{b)) :  n  >  0},  where  5”  is  used  to 
abbreviate  n  applications  of  p,  so  that  g^{c)  denotes  g(g(c)).  Both  of  these 
sets  contain  dependencies.  Such  dependencies  may  be  present  in  the  sets  of 
values  variables  may  be  bound  to  at  run-time.  Some  of  these  dependencies 
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logic  program 

imperative  program 

p(pair(X,X)). 

Y  :=  pairiX,X); 

q(pair{a,b)). 

Y  :=  patr(a,6); 

q(pair(c,d)). 

if  match,^ir{X)  then 

-  q(y)- 

Y  :=  pair(c,  d); 

Figure  5.1:  Different  Ways  of  Introducing  Dependencies 

are  explicitly  introduced  by  the  action  of  program  variables,  while  others 
are  introduced  through  the  merging  of  different  computation  paths.  To 
illustrate  this,  consider  the  four  programs  in  Figure  5.1.  In  both  of  the  top 
two  programs,  the  set  of  values  for  Y  after  program  execution  is  {pair{v, ») : 
t;  is  a  value};  the  dependencies  in  these  two  examples  are  introduced  by 
the  variable  Y.  In  contrast,  after  execution  of  either  of  the  bottom  two 
programs  in  Figure  5.1,  the  value  of  Y  is  either  pair{a,b)  or  pair{c,d); 
the  dependencies  here  are  introduced  by  merging  computation  paths.  Note 
that  the  dependencies  introduced  by  variables  may  be  iniinite  in  nature, 
whereas  the  dependencies  introduced  by  the  merging  of  computation  paths 
are  essentially  finite.  This  has  important  decidability  implications. 

At  the  heart  of  set  based  analysis  is  the  tradeoff  between  infinite  sets 
of  values,  dependencies  and  decidability.  First  observe  that  since  we  do 
not  employ  an  approximation  of  the  underlying  computation  values,  we 
must  deal  directly  with  infinite  sets  of  values.  Also  note  that  the  notion  of 
intersection  is  inherently  present  in  set  based  analysis  because  a  conditional 
expression  of  the  form  X  =  Y  naturally  leads  to  the  intersection  of  the  sets 
of  values  for  X  and  Y.  Now,  as  we  have  just  noted  above,  the  action  of 
variables  may  introduce  dependencies  that  are  unbounded  in  nature.  Such 
dependencies  imply  that  sets  of  the  form  {/(5’*(o)7ff"(i))  :  »  >  0}  may 
arise.  This  is  suggestive  of  the  expressive  power  of  context  free  grammars. 
Since  the  intersection  of  two  context  free  grammairs  is  not  recursive,  it  is 
not  surprising  that  unbounded  dependencies  must  be  curtailed  to  obtain 
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decidability. 

A  more  concrete  explanation  of  the  undecidability  of  approx p  is  based  on 
the  observation  that  the  semantic  constructs  of  the  language  are  sufficiently 
powerful  that  reasoning  involving  unbounded  dependencies  can  stiU  be  car¬ 
ried  out,  even  though  inter-variable  dependencies  cannot  appear  in  sets  of 
environments.  Specifically,  let  P  be  an  imperative  program,  and  consider 
coding  P  as  an  essentially  equivalent  imperative  program  that  involves  only 
one  variable.  To  do  this,  let  be  the  variables  in  P,  and  let  Env 

be  a  new  variable  that  shall  range  over  lists  of  length  n,  representing  envi¬ 
ronments.  Let  Env*  denote  car{cdT*~^{Env))  (to  access  i***  element  of  list  in 
Env)  and  let  [si, . . .  ,Sn]  denote  the  list  of  length  n  whose  elements  are,  in 
order,  si, . . .  ,Sn.  Now,  construct  a  program  P*  from  P  by  first  replacing  all 
assignment  statements  Xi  :=  t  by  Env  :=  [A’i,...,A’,_i,t,A’,+i,...,X„], 
and  then  replacing  all  occurrences  of  Xi  by  Env',  1  <  *  <  n.  Clearly 
the  resulting  P'  is  equivalent  to  P  in  the  sense  that  if  P  starts  execu¬ 
tion  from  environment  •-+«„]  and  reaches  program  point 

p.  with  environment  then  execution  of  P'  starting 

from  environment  [£nui-+[«i,...,ttn]]  reaches  point  p  with  environment 
(Pn^7^-> [«!,..., t>n]]*  Moreover,  P*  contains  only  one  variable.  It  follows 
that  lm(€Cpt)  =  approx p>.  Hence,  any  oracle  for  deciding  p  6  approxpi{^>^) 
can  be  used  to  decide  p  € 

A  similar  kind  of  construction  is  possible  for  logic  programs.  The  im¬ 
portant  observation  here  is  that  approx p  still  allows  an  equality  predicate 
to  be  defined.  Specifically,  let  P  be  an  aurbitrary  logic  program.  First 
rewrite  P  into  an  equivalent  program  that  contains  only  unary  predicate 
symbols  (this  can  be  easily  done  by  introducing  new  function  symbols  /„  of 
every  arity  n,  and  then  systematically  rewriting  each  atom  p(5i,...,5„) 
into  p(/n(5i>---»3n)))>  Second,  rewrite  each  rule  p{s)*-B\,...,Bn  into 
p{X)*-eq{X, ..,Bn  where  A”  is  a  variable  that  does  not  already  ap¬ 
pear  in  the  rule.  Third,  add  the  rule  eg(X,X).  Call  the  resulting  program 
P'.  Clearly  P'  mimics  P  in  a  very  straightforward  manner.  Moreover, 
approxpi  and  lm{€Cpi)  coincide  in  the  following  sense:  for  all  head  atoms 
A"  in  P' 

approXpi{9  ~var(A)  lni(SCpt)(9^). 

In  both  of  these  constructions,  the  key  observation  is  that  variable  depen¬ 
dencies  appear  in  approx p  when  an  environment  is  applied  to  a  term  with 
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multiple  occurrences  of  a  variable.  Such  dependencies  must  be  eliminated  if 
a  decidable  program  approximation  is  to  be  obtained. 

The  approach  for  eliminating  these  dependencies  is  based  on  the  use  of 
set  environments.  Recall  that  set  environments  were  introduced  to  repre¬ 
sent  set  based  collections  of  environments.  Not  only  can  set  environments 
be  treated  as  collections  of  environments,  but  they  can  also  be  treated  as 
environment-like  mappings.  Whereas  an  environment  is  pointwise  in  the 
sense  that  it  specifies  a  single  value  for  each  variable,  a  set  environment  is 
set-wise  in  the  sense  that  it  specifies  a  set  of  values  for  each  variable.  In 
analogy  with  environments,  there  is  a  natural  notion  of  changing  the  binding 
of  a  variable  in  a  set  environment.  Specifically,  where  p  is  a  set  environment 
and  5  is  a  set  of  values,  the  notation  denotes  the  set  environment 

that  maps  all  variables  into  the  empty  set  if  either  5  or  pCX)  is  the  empty 
set,  and  otherwise  is  the  set  environment  that  maps  X  into  S  and  agrees 
with  Q  on  program  variables  different  from  X. 

Just  as  environments  are  used  to  assign  meanings  to  program  terms 
and  conditions,  we  can  use  set  environments  to  assign  an  alternative  “set 
based”  meaning  to  program  terms  and  conditions.  For  example,  suppose 
that  X  is  the  only  variable  of  interest  and  consider  an  environment  expres¬ 
sion  ’$[X)-»patr(.Y,  JT)]  corresponding  to  an  assignment  statement.  Sup¬ 
pose  that  is  the  set  environment  {[A')-»{o,6}]}.  Under  the  normal 
interpretation,  the  environments  in  7(^)  are  considered  one  at  a  time  and 
applied  to  the  program  term  pair{X,X)  to  obtain  a  binding  for  X.  The 
result  is  that  J(^[A'H->patr(A',A')])  yields  the  collection  of  environments 
{[A'i-^pa*r(a,  a)],  [X>^pair{b,  6)]}.  However,  if  I{9)  is  applied  to  pair{X,  X) 
as  a  set  environment,  then  each  occurrence  of  X  is  treated  as  a  set,  result¬ 
ing  in  {X*-*pair(a,a),X>-^pair(a,b),X^pair{b,a),X>^pair{b,b)}.  Hence, 
by  using  set  environments  to  interpret  program  terms  and  conditions,  we 
obtain  a  natural  method  for  eliminating  inter-variable  dependencies. 

We  now  outline  how  this  approach  can  be  generalized  to  interpret  ar¬ 
bitrary  environment  expressions.  This  forms  the  core  part  of  the  defini¬ 
tion  of  set  based  interpretation  of  environment  constraints.  In  particular 
we  shall  develop  interpretations  of  the  expressions  9[cond]  and 

[Ai  6  Bi.'9i,...,An  €  In  the  remainder  of  this  section,  let  J 

denote  an  interpretation  that  maps  environment  variables  into  set  environ¬ 
ments.  Let  A  denote  the  operator  that  maps  a  collection  6  of  environments 
into  a  set  environment,  denoted  <4(6),  as  follows 
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>l(e)(X)  ipiX) :  p  €  0}. 

Note  that  >1(0)  is  the  smallest  set  environment  containing  0. 

Consider  first  the  set  based  interpretation  of  an  environment  expression 
of  the  form  corresponding  to  an  assignment  statement.  The  stan¬ 

dard  interpretation  of  such  an  expression  is 

=  {p[Xt-*p{t)] :  p  €  0}  where  0  is  {p  €  J('P) :  p  C>  t}. 

Now,  the  set  based  interpretation  of  this  expression  is  obtained  by  treating 
0  as  a  set  environment  g  and  applying  g  directly  to  the  program  term  t 
instead  of  applying  0  to  t  on  an  environment  by  environment  basis.  The 
resulting  set  of  values  is  then  used  to  appropriately  update  g.  Specifically, 
we  define  that  the  set  based  interpretation  of  under  I  is  given  by 

1  ^  p[.X’>-*p(0]  where  p  is  .4  ({p  €  T('®')  :  p  >  t}) 

Next  consider  an  environment  expression  of  the  form  '^[cond]  corre¬ 
sponding  to  imperative  program  conditions.  Now,  the  effect  of  [cond]  is 
to  restrict  environments  in  !(’*').  In  the  standard  interpretation  of  con¬ 
ditions,  environments  are  applied  to  program  terms  one  at  a  time,  and  a 
notion  of  variable  dependency  can  arise  that  is  similar  to  the  dependencies 
arising  in  the  interpretation  of  For  example,  suppose  that  /  is  a 

binary  function  symbol  and  consider  the  program  consisting  of  the  single 
statement  if  (/(i)(^)  -  X  A  =  X^  then  Seq.  Let  the  label  of  the 

first  statement  of  be  0  and  let  the  last  of  the  last  be  7.  The  environment 
constraints  for  this  program  consist  of  the  constraints 

3  T 

2  «'>[/;;|(y)  =  XA/,-|(y)  =  Ar] 

»i>  2  (/(i;(y)  =  jr)  V  -  (/pjcy)  =  at)] 

together  with  the  appropriate  constraints  for  Seq.  For  this  program,  approx p 
maps  into  the  set  of  all  environment  and  maps  9^^  into  the  collection 
of  environments 

{[A'i->u,yt-»/(t>,t;)] :  for  all  u  and  t>}. 

Again,  dependencies  are  introduced  through  the  treatment  of  distinct  oc- 


114 


CHAPTER  5.  SET  BASED  APPROXIMATION 


currences  of  variables. 

To  ignore  such  dependencies,  we  develop  an  interpretation  of  environ¬ 
ment  expressions  $[cond]  using  set  environments.  First  consider  the  stan¬ 
dard  interpretation  of  an  expression  of  the  form  9[cond]: 

I(^[cond])  =  {p  €  0  :  p  h  cond}  s.t.  6  is  {p  €  T(^)  '•  P  >  cxtnd}. 

This  involves  three  components:  (i)  the  process  of  collecting  environments  p 
such  that  p  €  T($)  and  p  t>  cond  into  a  set  6,  (ii)  the  relation  p  ^  cond,  and 
(iii)  the  process  of  collecting  environments  p  such  that  p  €  6  and  p  |=  cond. 
These  three  components  are  replaced  by  set  environment  counterparts  as 
follows:  (i)  is  replaced  by  the  process  of  collecting  environments  p  such  that 
p  6  T(9)  and  p  >  cond  into  a  set  environment  p,  (ii)  is  replaced  by  the 
relation  p  cond  that  defines  the  notion  of  an  environment  p  satisfying 
cond  in  the  context  of  set  environment  g,  and  (iii)  is  replaced  by  the  process 
of  collecting  environments  p  such  that  p  €  0  and  p  cond  into  a  set 
environment.  We  now  outline  the  definition  of  p  cond;  the  full  details 
can  be  found  in  the  next  section. 

Consider  the  atomic  condition  /(i)(^)  =  Such  a  condition  essentially 
represents  two  restrictions  on  an  environment  p.  The  left  hand  side  of  the 
equality  restricts  the  values  of  Y  and  the  right  hand  side  restricts  X.  In 
other  words,  p  ^  /(7)(^)  =  ^  thought  of  as  the  combination  of  the 

two  restrictions  (i)  p(/(ij(5'))  €  {p{X)}  and  (ii)  p{X)  €  {p(/(ij (>"))}•  In 
the  set  based  interpretation  of  this  basic  condition,  these  two  restrictions 
are  explicitly  separated,  and  the  set  environment  g  is  used  to  interpret  X  in 
(i)  and  /(^)(F)  in  (ii).  Hence,  p  /j“J(y)  =  A”  is  satisfied  if 

«  «(•*■)  <>w  «  «(/?)(>')) 

Using  this  interpretation,  consider  agadn  the  environment  expression 

and  suppose  that  Is  the  set  of  all  environments.  Instead  of  obtain¬ 

ing  the  set  {[Art->ti,yt-*/(t;,u)]  :  for  all  u  and  u},  we  now  obtain  the  set 
{[X>^u,Y>^f{y\,V2)]  :  for  all  «,  vi  and  oj}.  Hence  dependencies  are  no 
longer  introduced. 

The  interpretation  of  an  atomic  condition  ->(s  ^  t)  can  similarly  be 
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obtained.  Again  p  ^  -1(5  ^  t)  can  be  split  into  (i)  3t;(p(s)  ^  v  f\v  e  {p(0} 
and  (ii)  3v(/»(t)  ^  v  A  v  €  {/>(5)}-  The  set  based  interpretation  of  this  basic 
condition  uses  g  to  interpret  t  in  (i)  and  s  in  (ii).  Specifically,  p  ->(s  ^  t) 
holds  if 

3t;(v  €  Q(t)  A  n  p(3))  and  3v{v  €  p(s)  Av  ^  p(t)). 

In  other  words,  whenever  a  term  is  used  in  such  a  way  that  a  new  value 
is  built  up,  then  a  set  environment  is  used  to  interpret  the  term  (since 
otherwise  infinite  dependencies  may  be  introduced).  In  contrast,  if  a  term  is 
used  for  the  purpose  of  restricting  the  values  of  program  variables,  then  the 
term  is  interpreted  using  a  (normal)  environment  (because  no  dependencies 
may  be  introduced  by  this  process).  Note  that  atomic  conditions  such  as 
match  f(s)  and  -^(match /(^))  serve  only  to  restrict  the  values  of  the  program 
variables  contained  in  s,  and  so  they  cannot  introduce  dependencies.  Hence 
set  environments  are  not  needed  for  their  interpretation. 

Finally,  consider  an  expression  [Aj  6  6  Bn.'S?n]  corre¬ 

sponding  to  a  logic  program  rule.  Again  the  standard  interpretation  of  this 
expression  may  introduce  dependencies  through  multiple  occurrences  of  a 
variable.  For  example,  consider  the  logic  program  consisting  of  the  rules 
€q{X,X)  (labeled  with  1)  and  the  goal  *-eq{Y,pair{ZyZ))  (labeled  with  2). 
The  bottom-up  environment  constraints  for  this  program  are 

«'  2  (I 

2  [eq(Y,nZ.Z))eeq(X.X).*'] 

and  approx  p  maps  into  the  collection  of  environments 

{[A'*->tt,y*->pair(t;,  u)] :  for  all  u  and  v}. 

Dependencies  are  introduced  here  through  the  two  occurrences  of  Z. 

Set  environments  can  be  used  to  eliminate  such  dependencies  in  a  manner 
similar  to  that  used  for  environment  expressions  of  the  form  ’P [A” *-><].  First, 
recall  that  the  standard  interpretation  of  an  expression  of  the  form  I{\A\  € 
€  5„.«„])  is 

{p  :  for  each  i,  p(A,)  €  {p\Bi)  :  /  €  I('i't)}}- 

This  contains  the  two  kinds  of  components:  (i)  tho  application  of  a  set 
of  environments  J(^i)  to  the  atom  Bi  to  obtain  a  set  of  ground  atoms. 
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and  (ii)  the  collection  of  environments  p.  In  the  set  interpretation  of  this 
expression,  components  in  (i)  are  replaced  by  the  application  of  the  set 
environment  T(9i)  to  the  atom  Bi,  and  the  component  in  (ii)  is  replaced  by 
the  collection  of  environments  p  into  a  set  en^onment.  For  example,  the 
constraint  D  [eq(Y,f{Z,Z))  €  e5(X,X).’P*]  is  interpreted  as 

({p  :  pieq{YJ(Z,Z)))  €  gieqiX^X))  where  g  is  IC'i'')}). 

More  generally,  the  environment  expression  [Ai  €  Bi.9i,...,An  € 
is  interpreted  under  I  as  the  set 

J i[Ai  e  Bi. 9 1,..., An  €  Bn.9n])  =  >* ({p  : p(Ai)  € 

In  summary,  the  use  of  set  environments  to  interpret  environment  con¬ 
straints  corresponds  to  treating  each  variable  as  a  set  of  values.  Moreover, 
the  use  of  sets  in  the  interpretation  of  environment  constraints  provides 
a  simple  and  natural  way  to  ignore  all  dependencies  that  are  introduced 
through  the  action  of  variables.  For  this  reason  we  equate  set  based  anal¬ 
ysis  with  analysis  in  which  all  inter-variable  dependencies  are  ignored  (and 
no  other  approximations  are  made).  Importantly,  this  uniform  and  intu¬ 
itive  reading  of  environment  constraints  leads  to  an  accurate  and  decidable 
analysis. 


5.3  Set  Based  Interpretation  of  SCp 


We  now  present  the  complete  details  of  the  set  based  interpretation  of  the 
environment  constraints.  We  begin  by  summarizing  some  definitions  in¬ 
troduced  in  the  previous  section.  A  set  environment  p  is  a  mapping  from 
program  variables  into  sets  of  values  such  that  if  g  maps  some  program  vari¬ 
able  into  the  empty  set  then  it  maps  all  program  variables  into  the  empty 
set.  We  identify  set  based  collections  of  environments  with  set  environments, 
and  write  p  €  p  to  denote  that  p(X)  €  g(X)  for  each  program  variable  X. 
If  p  is  a  set  environment  and  5  is  a  set  of  values  then  p[A'i-»5]  denotes  the 
set  environment  that  maps  all  variables  into  the  empty  set  if  either  S  or 
g(X)  is  the  empty  set,  and  otherwise  is  the  set  environment  that  maps  X 
into  S  and  agrees  with  p  on  program  variables  different  from  X.  A  denotes 
the  operator  that  maps  a  collection  6  of  environments  into  the  smallest  set 
environment  containing  6  and  is  defined  as  foDows: 
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A{e)ix)  {pWlpee}. 

Note  that  the  fixed  points  of  A  are  exactly  the  set  based  collections  of 
environments. 

Just  as  (pointwise)  environments  are  extended  to  become  partial  func¬ 
tions  from  program  terms  into  values,  so  set  environments  are  extended  to 
become  functions  from  program  terms  into  sets  of  values.  Let  p  be  a  set 
environment  and  let  t  be  a  program  term.  If  p  is  the  set  environment  ±  that 
maps  all  variables  into  the  empty  set  then  p(t)  is  the  empty  set,  regardless 
of  t.  Otherwise  p(t)  is  defined  as  follows: 


•  If  t  is  a  program  variable,  then  p(t)  is  already  defined. 

•  If  t  is  f(ti , . . . ,  tn)  then  p(t)  is  {/(t>i, . . . ,  ©n) :  €  p(t.)}. 

•  If  t  is  then  p(t)  is  :  /(»i, . . . ,  Vn)  6  p(4)}. 

We  now  use  set  environments  to  interpret  program  conditions.  We  define 
a  relation  p  |=«  cond  to  be  read  as  p  satisfies  cond  in  the  context  of  the  set 
environment  p.  As  noted  previously,  it  is  assumed  that  program  conditions 
are  first  written  into  disjunctive  normal  form.  Let  p  be  an  environment,  let 
p  be  a  set  environment  and  define  that: 

•  p^gS  =  t  iff  p(s)  €  p(t)  and  p(t)  €  p(s). 

•  p^g  ->(e  =  t)  iff  3u(t7  €  p(t)  Av^  p(s))  and  3t7(t>  €  p(5)  Av  ^  p(t))- 

•  p^g  match f(s)  if  p(s)  is  of  the  form  f{vi 

•  p}re  -'match /(s)  if  p(s)  is  not  of  the  form  f{vi, . . . ,  Un). 

9  p\=g  cond\  A  cond2  iff  p  cond\  and  p  (=,  cond^. 

•  p^t  condi  V  oondj  iff  either  p  1=^  cond\  or  p  cond2. 

A  set  based  interpretation!  is  a  mapping  from  each  environment  variable 
into  a  set  environment.  Such  a  mapping  can  be  extended  to  map  from 
environment  expressions  ee  into  a  set  environment  as  follows. 
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•  2{9)  is  already  defined. 

•  J(T)  =  {all  environments}. 

•  where  g  is  .4({p  6  I('i') :  p  >  0) 


P^Q 
p  cond 


•  X{^[cond])  =  ^^|p: 

•  I([.4i  €  51.4-1,..., >1„  €  =  .4 


where  p  is  .4  ^^p : 


peiWl 

p  >  cond  I 


p(4i)el(^i)(5i)  V 


p(4n)  6  2:('a'n)(5„)  J, 


Note  that  in  the  last  part  of  this  definition,  each  is  a  set  environment, 
and  so  the  expression  l'(-^,-)(5,-)  denotes  the  set  of  ground  atoms  resulting 
from  applying  2’(4'i)  to  5,-.  A  set  based  model  of  a  collection  of  environ¬ 
ment  constraints  is  a  set  based  environment  that  satisfies  each  constraint 
in  the  collection.  Using  set  based  models,  we  can  now  define  the  set  based 
approximation  of  a  program. 


Definition  12  (Set  Based  Approximation)  Let  P  be  a  program  and  let 
€Cp  be  the  environment  constraints  of  P.  Then  the  set  based  approximation 
of  P,  denoted  sbop,  is  the  least  set  based  model  of  SC p.  [] 


Importantly,  sbap  is  a  model  of  SCp. 


Proposition  13  For  all  programs  P,  sbap  is  a  model  of  SCp. 


Proof:  The  proposition  is  proved  by  showing  that  any  set  based  model  of 
SCp  is  a  model  of  SCp.  To  this  end,  let  J  be  a  set  based  interpretation. 
Now,  when  X  is  extended  to  map  from  environment  expressions  into  sets 
of  environments,  either  the  set  based  interpretation  rules  can  be  used,  or 
else  X  can  be  treated  as  a  (normal)  interpretation  and  the  rules  for  (normal) 
interpretation  used.  Let  SET{X,  ee)  denote  the  set  of  environments  obtained 
when  the  environment  expression  ee  is  interpreted  under  X  using  the  set 
based  interpretation  rules.  Let  NMlfX,  ee)  denote  the  set  of  environments 
obtained  when  the  environment  expression  ee  is  interpreted  under  J  using 
the  (normal)  interpretation  rules.  To  prove  the  proposition,  if  is  sufficient 
to  show  that  SET{X,  ee)  2  NML{X,  ee). 


5.3.  SET  BASED  INTERPRETATION  OF  £Cp 


119 


Clearly  if  ee  is  either  an  environment  variable  or  T  then  SET(I,ee)  = 
NML(X,  ee).  Before  proving  the  remaining  three  cases,  it  is  convenient  to 
prove  that 

if  p£  e  then  p(i)  €  p(t)  (5.8) 

where  t  is  a  program  term  and  p  is  a  set  environment  that  is  defined  on  t. 
This  fact  can  be  easily  establish  by  structural  induction  on  t. 

Using  (5.8),  it  is  now  straightforward  to  complete  the  proof.  First  sup¬ 
pose  that  ee  is  If  p  €  NMIj(I,9[Xt~*t])  then  there  exists  an 

environment  p*  such  that  p'  >  t,  p'  €  I('i')  and  p  =  p'[X^p'{t)].  Now,  let 
Q  be  >l({p  €  T{9) :  p  t>  t}).  Clearly  p'  £  p  and  so  p'(t)  €  g(t)  by  (5.8).  It 
follows  from  the  definition  of  p[X«-»5]  that  p  €  p[Xi-»p(t)]. 

Now  consider  an  environment  expression  of  the  form  ¥[cond].  Using 
(5.8)  it  is  easy  to  verify  the  following  property  by  structural  induction  on 
cond: 

if  p  >  cond,  p  1=  cond,  and  p  €  p  then  p  cond  (5.9) 

where  cond  is  a  program  condition  and  p  is  a  set  environment.  To  complete 
the  proof  for  f'[cond],  suppose  that  p  £  NML(I,  '®^[cond]).  This  implies  that 
p  €  p  >  cond  and  p  f=  cond.  Let  p  be  ^({p  €  T('®')  :  P  t>  cond}). 

Clearly  p  €  p  and  so  p  1=^  cond  by  (5.9).  Hence  p  €  SET{T,^l{cond\). 

Finally,  consider  [Ai  £  .,An  €  and  suppose  that  p  £ 

NMU1,[A\  £  B\.^\,...,An  €  5„. §'„]).  This  implies  that  for  i  =  l..n, 
there  exists  an  environment  pi  such  that  p{Ai)  =  pi{Bi)  and  p,-  £  T('^i). 
From  (5.8)  it  follows  that  p(v4,)  €  Z(?',)(H,).  Hence  p  £  SET{I,^[cond]). 

D 

This  proposition  implies  that  sbop  D  lm{SCp).  Since  we  have  already 
proved  that  lm(SCp)  corresponds  to  CSp  (see  Theorem  1  for  imperative 
programs  and  Theorem  3  for  logic  programs)  it  follows  that  sbop  is  a  con¬ 
servative  approximation  of  the  collecting  semantics  of  a  program  in  the  fol¬ 
lowing  sense: 
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Theorem  6  (Correctness  of  sbop) 

•  For  imperative  programs  P,  sbap  D  CSp. 

•  For  logic  programs  P, 

sbap{^°‘)  =var(a)  (p  •  P  h  ^  ^  €  C«Sp(a)},  for  all  labels  a. 

D 


We  conclude  this  section  by  noting  that  the  uses  of  A  in  the  definition  of 
set  based  interpretation  could  have  been  removed  without  altering  the  defi¬ 
nition  of  sbap.  They  have  been  retained  to  emphasize  that  various  objects 
in  the  definition  are  set  environments.  To  see  why  they  do  not  affect  sbap, 
first  number  the  four  occurrences  of  A  in  order  so  that  the  first  occurrences 
appears  in  the  interpretation  of  the  second  and  third  (numbered 

left-to-right)  appear  in  the  interpretation  of  ’^[cond]  and  the  last  appears  in 
the  interpretation  of  [Ai  €  Now,  consider  the  sec¬ 

ond  and  last  occurrences  of  A.  These  occurrences  ensure  that  the  set  based 
interpretation  of  an  environment  expression  is  always  a  set  based  collection 
of  environments.  However,  when  determining  whether  a  set  environment 
is  a  model  of  a  constraint  D  ee,  the  difference  between  retaining  these 
occurrences  and  omitting  them  reduces  to  the  difference  between  the 

1(9)  2  AiS^ej)  and  1(9)  D 

where  S^ej  is  a  set  of  environments  dependent  on  ee  and  X.  Since  X(9)  is 
required  to  be  a  set  environment,  these  two  formulas  are  equivalent. 

Now  consider  the  first  and  third  occurrences  of  A.  These  occurrences 
are  applied  to  environment  sets  of  the  form  ({/>  6  T{9) :  p  t>t})  or  ({p  6 
T{9)  :  p  >  cond})  to  ensure  that  a  set  environment  is  obtained.  However, 
the  following  proposition  shows  that  if  X{9)  is  a  set  based  collection  of 
environments,  then  ({p  €  T{9)  :  p  >  t})  and  ({p  G  T{9)  :  p  >  cond})  will 
also  be  set  based  collections  of  environments  and  so  the  application  of  A 
will  not  have  any  effect. 

Proposition  14  Let  Q  be  a  set  based  collecting  of  environments  and  let 
ti,...,tr  be  program  terms.  Then  {p  €  0  :  f^k=\..rP  >  tfc}  «  set  based 
collection  of  environments. 

Proof:  The  proposition  is  established  by  showing  that 
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A{{p  €  0  :  >  4})  C  {p  €  0  :  Ak=i..TP  t>  <*} 

Let  p  e  A{{p  e  0  :  Ak=\..Tp  >  and  it  remains  to  show  that  p  €  {p  € 
0  :  Akzzi..rP  fc*  tk}’  From  the  definition  of  A  it  follows  that  there  exists 
an  environment  px  such  that  Px  €  {p  6  0  :  Ak=i..rP  >  tit}-  Since  each 
Px  €  0  and  0  is  set  based,  it  follows  that  p  €  0.  Hence  to  show  that 
p  €  {p  €  0  :  Ait=i..rP  t>  tfc},  it  suffices  to  show  that  p  t>tk,k  =  l..r.  The 
proof  of  this  proceeds  by  structural  induction  on  each  tk-  The  induction 
hypothesi's  is  that  p  >  tit  and  that  either 


(a)  for  some  program  variable  X,  px  t>  tk  and  p{tk)  =  Px(tk)^  or 

(b)  there  exist  subterms  si, . . . ,s„  of  t*  such  that  for  any  environment  p', 
p'  >  tk  implies  that  p'  > Sj,  *  =  l..n,  and  p'(tk)  =  /(p'(si),  •  -  •  »/>'(sn))- 


First  suppose  that  tk  is  a  variable,  say  X.  Then  p  >tk  and  p{tk)  =  Pxi^k), 
and  so  the  induction  hypothesis  holds  with  condition  (a). 

Now  suppose  that  tk  is  of  the  form  /(si,...  ,5n)-  Since  eawdi  satisfies 
the  induction  hypothesis,  it  follows  that  each  p(s,)  is  defined  and  so  p  >tk. 
Also,  it  is  immediate  that  for  any  environment  p',  p'  >tjfe  implies  that  p'  >s,-, 
i  =  l..n,  and  p'(ffc)  =  Hence  tk  satisfies  condition  (b). 

The  remaining  case  is  where  tk  is  of  the  form  /(})  Now,  on  applying 
the  induction  hypothesis  to  s,  it  follows  that  p  >  s  and  s  satisfies  either  (a)  or 
(b).  First  suppose  that  s  satisfies  (a).  Then  p(s)  =  Px(«)  for  some  program 
variable  X,  and  since  px  ,  it  follows  that  PxC^)  must  be  of  the  form 

/(•••).  Hence  p(/^j(s))  is  defined  and  is  in  fact  equal  to  Px(/(^J('S)),  and 
so  tk  satisfies  caise  (a).  Now  suppose  that  s  satisfies  (b).  Then  there  exist 
subterms  of  a  such  that  for  any  environment  p',  p'  t>  a  implies 

that  p'  >  Si,  i  =  l..n,  and  p'(a)  =  p(p'(ai),. . .  ,p'(a„)).  This  has  a  number 
of  consequences.  First,  since  px  >/«■(»),  it  must  be  the  case  that  f  =  g. 

Second,  since  p  >  a,  it  must  be  the  case  that  p  >  defined,  and 

furthermore,  that  p(/(“j  (s))  =  p(s,).  On  applying  the  induction  hypothesis 
to  Si,  it  is  clear  that  tk  respectively  satisfies  (a)  or  (b)  if  a,-  satisfies  (a)  or 

(b).  D 
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5.4  Alternative  Definitions 


The  basic  goal  of  set  based  analysis  is  to  obtain  a  very  simple  definition  of 
approximation  based  on  the  notion  of  ignoring  dependencies  arising  from 
the  behavior  of  variables.  However  this  notion  can  potentially  be  realized 
in  a  number  of  different  ways  and  the  definition  of  set  based  program  ap¬ 
proximation  presented  in  the  previous  section  represents  a  choice  among  a 
number  of  possible  definitions.  We  now  outline  the  major  alternatives  and 
compare  them  with  set  based  analysis.  In  particular,  we  shall  argue  that 
the  set  based  analysis  is  the  most  natural  choice,  given  the  requirements  of 
decidability,  accuracy  and  simplicity. 


Language  Restrictions 

Perhaps  the  simplest  definition  of  approximation  that  employs  the  idea  of 
ignoring  inter-variable  dependencies  is  approxp.  We  have  already  shown 
that  approXp  is  not  decidable  and  that  it  does  not  ignore  all  inter-variable 
dependencies.  In  essence,  the  language  operations  are  sufiidently  powerful 
that  unbounded  dependencies  can  be  introduced  even  when  all  collections  of 
environments  are  free  from  inter-variable  dependencies.  One  way  to  address 
this  problem  is  to  restrict  the  language  so  that  the  langu^e  operations 
cannot  by  themselves  introduce  unbounded  dependencies.  This  approach 
was  used  in  an  early  version  of  set  based  analysis  for  imperative  programs 
reported  by  Heintze  and  Jaffar  in  [23].  In  essence,  this  paper  obtains  a  de¬ 
cidable  program  approximation  based  on  approx p  by  restricting  imperative 
programs  in  the  following  two  ways: 

(i)  Assignment  statements  must  have  the  form  X  :=  f{Xi,. . . , Xn)  where 
the  Xi  were  distinct; 

(ii)  Program  conditions  must  have  one  of  the  following  forms: 

(a)  X  =  Y  where  X  and  Y  are  program  variables; 

(b)  match  f{X)  where  X  is  a  program  variable,  or 

(c)  a  negation  of  (a)  or  (b). 


Intuitively,  these  restrictions  ensure  that  multiple  occurrences  of  program 
variables  cannot  occur  in  a  term.  For  example,  a  statement  such  as  V  := 
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pair(X,X)  cannot  be  written.  This  implies  that,  in  isolation,  conditions  and 
assignments  cannot  introduce  inter-variable  dependencies.  In  other  words, 
the  only  form  of  inter-variable  dependency  that  arises  is  such  programs  is 
dependency  between  variable  values  in  collections  of  environments.  Since 
approx p  ignores  all  such  dependencies,  it  follows  that  for  programs  satisfying 
(i)  and  (ii),  the  approximation  approx p  is  free  of  inter- variable  dependencies. 
In  fact,  for  such  programs  we  can  show  that  approx p  =  sbap.  Hence,  the  set 
based  approximation  of  imperative  programs  described  in  this  thesis  can  be 
viewed  as  a  conservative  extension  of  the  approximations  defined  in  [23]. 

The  main  drawback  of  using  this  subclass  of  imperative  programs  defined 
by  (i)  and  (ii)  is  that  it  is  unreasonably  restrictive.  Although  any  imper¬ 
ative  program  P  can  be  transformed  (by  “unfolding”  complex  assignments 
and  conditions)  into  a  semantically  equivalent  program  P'  that  satisfies  (i) 
and  (ii),  the  transformation  from  P  to  P*  forgets  much  of  the  structure 
of  P  and  this  has  a  detrimental  effect  on  the  accuracy  of  the  analysis.  In 
particular,  we  can  show  that  approx p,  D  abap.  Despite  this  drawback,  this 
approach  does  have  one  appealing  property.  In  essence  the  approximation 
approx  p  corresponds  to  a  Hoare-style  reasoning  about  an  imperative  pro¬ 
gram  using  assertions  that  do  not  express  information  about  inter- variable 
dependencies.  Specifically,  consider  an  assertion  language  consisting  of  for¬ 
mulas  of  the  form  #i  A  •  •  •  A  where  each  $,•  is  a  formula  containing  at 
most  one  free  variable.  Let  be  the  strongest  assertion  that  can  be  proved 
for  the  point  p  and  let  p  ^  ’i’  denote  that  p  satisfies  the  formula  $.  Then 
p  1=  iff  p  €  approx p(9'^). 

We  finally  note  the  imperative  language  used  in  this  thesis  employs  a 
moderate  language  restriction.  Recall  from  Chapter  3  that  atomic  program 
conditions  of  the  form  s  =  t  are  such  that  s  and  t  are  constructed  from 
program  variables  and  projection  symbols.  A  more  general  language  could 
be  defined  in  which  s  and  t  are  arbitrary  program  terms.  However,  sbap 
for  this  language  would  not  be  decidable.  Intuitively  this  is  because  the 
combination  of  function  symbols  and  projection  symbols  allows  a  form  of 
unbounded  dependency  to  be  introduced.  To  illustrate  the  reason  for  this, 
let  /  and  g  respectively  be  unary  and  binary  function  symbols,  let  cond  be 
the  program  condition 


124 


CHAPTER  5.  SET  BASED  APPROXIMATION 


and  consider  the  set  based  interpretation  of  the  environment  expression 
9[cond].  Suppose  that  2(’^)  is  the  set  environment  that  maps  X  into  the 
set  of  all  values  and  maps  Y  into  the  singleton  set  {^7(0,0)}.  Then,  using 
the  definition  of  set  based  interpretation,  J('^[cond])  is  the  set  environment 
that  maps  X  into  {g{f(a),f(a))}.  This  example  can  be  easily  modified  to 
show  that  sets  of  the  form  {^(/"(a),/”(a))  :  n  >  0}  can  be  formed  in  sbop, 
and  also  to  show  that  sba p  for  this  extended  language  is  undecidable  (for  a 
related  discussion,  see  Section  7.6  page  197). 

Although  our  restriction  on  atomic  program  conditions  of  the  form  s  =  t 
is  very  significant  from  a  decidability  point  of  view,  it  is  inconsequential 
from  a  programming  point  of  view  because  complex  conditions  such  as  the 
condition  /(^J =  Y  axe  rarely  written  in  programs. 

We  note  that  the  restriction  could  be  substantially  relaxed  to  admit  con¬ 
ditions  s  =  t  where  s  and  t  do  not  contain  combinations  of  function  and 
projection  symbols.  Also  note  that  it  is  easy  to  translate  from  an  arbi¬ 
trary  condition  s  =  t  into  an  equivalent  condition  that  is  in  our  language. 
For  example  9(f^^i9^i)iX)),  ^  could  be  translated  into 

/(T) (%)(^))  =  %)(^)  ^  Moreover,  such  a  trans¬ 

lation  results  in  little  loss  of  information  in  practice. 


More  Direct  Use  of  Set  Environments 

We  now  present  an  alternative  interpretation  of  environment  constraints 
that  employs  set  environments  in  a  very  direct  manner.  Consider  an  en¬ 
vironment  expression  of  the  form  ([Ai  €  Si.^i,...,A„  €  J5n*^n]-  The 
standard  interpretation  of  such  an  expression  under  an  interpretation  X  is: 

{p  :  for  each  i,  p(A,)  =  Pi(Bi)  for  some  p,-  € 

A  very  natural  way  to  modify  this  interpretation  to  use  set  environments  is 

U  (p  :  for  each  t,  Q{Ai)  =  Qi{Bi)  for  some  p,-  C 

where  U  denotes  the  pointwise  union  of  a  set  of  set  environments  and  C 
denote  subset  on  set  environments  (again  defined  pointwise).  Such  an  ap¬ 
proach  can  be  extended  in  a  straightforward  manner  to  the  other  kinds  of 
environment  constraints.  Moreover,  it  is  arguably  simpler  than  the  definition 
of  set  based  interpretation,  and  it  is  easy  to  verify  that  it  is  more  accurate. 
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1.  p(/(a,a)). 

2.  p{V)  ^  qiVJiW,W)Jis{W),siW))). 

3.  q{X,Y,X)^p(Y). 

«'!  2  [] 

^2  2  [q(VJ{W,W)J(siW),siW)))eq(X,Y,X).^^] 
n  2  \p(Y)epifia,a)).il/^] 

n  2  [p(y)  €  pcm'] 

Figure  5.2:  Undecidability  of  Modified  Set  Based  Interpretation 


Unfortunately  it  leads  to  an  undeddable  notion  of  program  approximation. 
In  essence,  this  is  because  inter-variable  dependencies  may  arise  and  these 
lead  to  unbounded  dependencies.  To  see  this,  consider  Figure  5.2,  which 
contains  a  logic  program  and  its  bottom-up  environment  constraints.  Using 
the  alternative  interpretation  just  outlined,  the  least  interpretation  that  is 
a  model  of  these  constraints  is 

t-*  {all  environments) 

4-2  [F^{/(5’*(a),s’*(a)) :  n  >  0),  iy.-»-{/(s’*(o),5’*(o)) :  n  >  0}] 

i->  |Xt-^{all  values), Ft-^{/(5’‘(o), s’‘(o)) :  n  >  0}j 

It  is  easy  to  modify  this  example  to  prove  that  the  program  approximation 
arising  from  this  interpretation  of  environments  expressions  is  undeddable. 


Ignoring  All  Dependencies 

Set  based  analysis  ignores  all  inter-variable  dependendes,  but  it  does  retain 
certain  notions  of  dependency  that  are  not  rdated  to  the  treatment  of  vari¬ 
ables.  For  example,  consider  Figure  5.3,  which  shows  an  imperative  program 
along  with  sbop  for  this  program  at  some  selected  program  points.  The  set 
of  values  for  X  spedfied  by  sbap  at  point  {3  exhibits  inter-argument  depen¬ 
dencies  in  the  sense  that  whenever  the  first  argument  of  cons  is  1,  the  second 
argument  is  cons{2,  nil),  and  whenever  the  first  argument  is  2,  the  second 
argument  is  nil.  If  inter-argument  dependendes  are  ignored,  then  this  set 
would  be  enlarged  to  include  cons(2, cons(2, ni7))  and  cons{l,nil).  An  im- 
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1.  X  :=  c(ms{l, cons(2,nil)); 

2.  while  X  ^  nil  do 

3.  X  :=  cdr{X)] 


point 

sbop  at  selected  points 

il,T2 

X  {cons(l,con5(2,m7))  } 

T3 

f  corw(l,<»rw(2,m7))  1 
\  cons{2,  nil)  j 

13 

^  { “'“‘a’""*  I 

Figure  5.3:  Inter-Argumei^  Dependency  Example 


portant  difference  between  these  two  kinds  of  dependency  is  that  ignoring 
inter- variable  dependencies  is  sufficient  for  obtaining  decidable  program  ap¬ 
proximations  -  it  is  not  necessary  to  ignore  inter-argument  dependencies. 
Intuitively,  this  is  because,  given  a  program  P,  the  inter-argument  depen¬ 
dencies  that  are  present  in  approx^  are  of  a  bounded  nature  in  the  sense 
that  they  are  due  to  the  (finite)  collection  of  program  terms  that  appear  in 
P.  In  other  words,  no  essentially  new  dependencies  can  be  generated.  In 
contrast,  dependencies  introduced  through  variables  are  potentially  infinite, 
such  as  those  introduced  through  a  statement  such  as  JiT  :=  pair{X,X). 

Several  approaches  to  program  approximation  based  on  ignoring  inter¬ 
argument  dependencies  have  been  proposed  in  the  literature  (see  for  example 
[48,  68])  and  we  shall  consider  these  in  greater  detail  in  Section  5.6.  Since 
ignoring  inter-argument  dependencies  implies  that  inter-variable  dependen¬ 
cies  are  ignored,  it  follows  that  such  approaches  are  strictly  less  accurate 
than  set  based  approximation. 


5.5  Examples 


We  now  give  some  examples  of  the  set  based  approximations.  First  we 
present  some  imperative  program  examples.  Figure  5.4  contains  the  envi¬ 
ronment  constraints  and  set  based  apprcodmation  of  the  program  consisting 
of  the  single  statement  X  :=  pair{X,X).  Figure  5.5  contains  the  constraints 
and  set  based  approximation  of  an  imperative  program  for  computing  the 
last  element  of  a  list.  Note  that  in  the  set  based  approximation  of  this  pro- 
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gram,  the  set  of  values  for  V  at  the  end  of  the  program  is  {a,6,c}.  This  is 
clearly  an  approximation  of  the  run-time  behavior  of  the  program  since  the 
only  possible  value  for  Y  at  the  end  of  program  execution  is  c. 

The  remaining  two  figures  give  examples  involving  logic  programs.  Fig¬ 
ure  5.6  presents  the  bottom-up  set  based  approximations  of  two  logic  pro¬ 
grams  and  illustrates  that  inter-variable  dependencies  are  ignored  in  set 
based  approximations,  but  inter-argument  dependencies  are  not  ignored. 
Finally,  5.7  presents  the  top-down  set  based  approximation  of  a  logic  pro¬ 
gram  that  computes  the  last  element  of  a  list.  Note  that  the  set  based 
approximation  this  program  is  exact  in  the  sense  that  the  set  assigned  to 
V  at  point  2  is  {6}  and  this  is  precisely  the  possible  values  for  Y  at  this 
point.  Intuitively,  this  is  because  the  set  of  possible  "calls”  (this  is  given 
by  the  union  of  the  sets  loop(a.b.nil,V).9^  and  loop(LyY).9*)  is  computed 
exactly,  and  the  only  possible  way  that  loop(W.nil,W)  cam  match  this  set  is 
with  W  =  b.  Moreover  the  rule  loop{X.L,Y)*-loop{L,Y)  cannot  generate 
any  new  answers  for  the  second  argument  to  loop.  Note  that  the  bottom-up 
set  based  analysis  of  this  program  would  not  be  exact. 
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fjri->{all  values}| 

Figure  5.4:  Set  Based  Appraxunation  of  X  :=  pttir(X,  X) 


$T1  3  j 

^11  3  ii/n[X^pair{X,X)] 


1.  L  :=  cons(a,  cons(5,  m7)); 

2.  X  e; 

3.  while  (match eo%$iL))  do 

4.  X  :=  car(L)', 

5.  L  :=  cdr(L)\ 


program  point 

ti 

ii,T2 

12 
T3 
T4 

i4,T5 

i5 

13 


^Tl 

D 

T 

^11 

3 

'®'l^[Lt-^o.6.nt/] 

^T3 

D 

$11 

D 

<^t3 

3 

$13 

3 

$15 

3 

♦T3[matcho,n,(I)] 

^14 

3 

j(X)] 

3 

$14 

$15 

3 

'f^®[L>-»cons^j(JD)] 

$13 

3 

«’T3[-,TOatchco7u(7i)] 

set  environment 

[£)-»{all  values},  A'>-»{all  values}] 
[L*-*{a.b.nti}y  A’»-*’{all  values}] 
[Lt-^{a.b.nil}y  Jf^-»{c}] 
[^►-♦{a.6.ntl,6,ni/,ni/},  A'i-+{o,6,c}] 
[L»-»{a.6.ni/,6.ni/},  A’>-»{o,6,c}] 
[L*-*{a.b.nUyb.nU},  A'h->'{6,c}] 
[L>-^{b.nU,nU}y  A’»-+{fc,c}] 
[it-+{m7},  A’i-»{o,6,c}] 


Figure  5.5:  Set  Based  Approximation  of  Program  2 
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1.  ♦-  g(X). 

1.  -  qiX). 

2  g(Y)^p(Y). 

2  qifiYr,Y2))*-p(fiY^,Y2)). 

3.  p(/(a,6)). 

3.  p{fia,b)). 

4.  p(f(c,d)). 

4.  p(f{c,d)). 

9^  2  lg(X)  e  qiY).9^] 

'f*  2  [qiX)eqinYr,Y2)).9^ 

2  [p(y)€p(/(a,6)).«3] 

2  lp(/(n,n))€p(/(a,6)).'p3] 

2  b(n€p(/(c,  (/)).♦") 

3  W/W.n))  e 

♦3  3  (J 

D  [1 

2  (1 

9^  2  (] 

«-!  ^  [X^{na,b)Jic,d)}] 
«r2  ^  [y^{/(a, 6), /(c,rf)}] 

»-♦  {all  environments} 

•-*  {all  environments} 

[  \  /(a,d),/(c,6)  /J 

-  [Vi-{a,c},  y2H.{5,d}] 
i-f  {all  environments} 

9^  t-f  {all  environments} 

- 

Figure  5.6:  Bottom-Up  Set  Based  Approx,  of  Two  Logic  Programs 
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2.  /oop(a.6.m7,  V)*. 

3.  loop(W.nil,W). 

5.  loopiX.L,Y)  ^  loop(L,Y)*. 


D  [] 

D  [loop(a.b.nil,V)  €  loop(W.nil,W).9^] 

D  [loop(a.b.nil,V)  e  loopiX-L^Y).'!/^] 

"9^  D  [loop(W.niI,W)  €  loop(a.b.nil,V).9^j 
9^  D  [loop(W.nil,W)  €  loop(L,Y).9*] 

9*  D  [loop(X.L,Y)  €  loop(a.b.nil,V).9^] 

9*  D  lloop(X.L,Y)  €  loop(L,Y).9*] 

9^  2  lloop(X.L,Y)  €  loop(a.b.nil,V).9,^  loop(L,Y)  €  loop(W.nil,W).9^] 
9^  D  lloop(X.L,Y)  €  loop(a.b.nil,V).9,^  loop(L,Y)  €  loop(X.L,Y).9^J 
9^  D  lloop(X.L,Y)  €  loop(L,Y).9*  loop(L,Y)  €  loop(W.nil,W).9^J 
9^  D  lloop(X.L,Y)  €  loop{L,Y).9*  loopiL,Y)  €  loop{X.L,Y).9^] 

[yH+{all  values}] 

^2  ^  [y_»{6}] 

«r3  ^  (iy.-{6}] 

[A'i-»{a,6},  yt-»{all  values),  ni/}] 

9^  [XH^{a,6},  y-^{6},  Z,^{6.ntl,m7}] 


Figure  5.7:  Top-Down  Set  Based  Approximation  of  Program  10 
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In  the  literature,  the  notion  of  program  approximation  appears  both  in  the 
areas  of  types  and  program  analysis.  For  the  purposes  of  the  following  dis¬ 
cussion,  the  distinction  between  these  two  areas  is  somewhat  artificial,  and 
it  is  more  useful  to  classify  these  works  according  to  the  underlying  approach 
used  to  obtain  program  approximation.  Broadly  speaking,  two  approaches 
have  been  used,  one  based  on  abstract  interpretation  and  the  other  based 
on  the  use  of  closure  operations  and  constraints.  We  shall  refer  to  these  as 
abstract  interpretation  and  non-abstract  interpretation  approaches,  respec¬ 
tively.  We  note  that  this  terminology  is  somewhat  loose  because  if  one  takes 
a  very  broad  view  of  abstract  interpretation,  then  many  of  the  non-abstract 
interpretation  approaches  can  be  viewed  as  abstract  interpretation.  The 
essential  difference  between  the  two  approaches  is  that  the  abstract  inter¬ 
pretation  approach  employs  (a  variant  of)  an  iterative  fixed  point  compu¬ 
tation  to  compute  the  program  approximation,  whereas  in  the  non-abstract 
interpretation  approach  the  program  approximation  cannot  be  computed 
by  an  iterative  fixed  point  computation  (even  with  the  aid  of  widening  and 
narrowing),  and  so  very  different  computation  techniques  must  be  employed. 


Abstract  Interpretation  Approaches 

In  these  approaches,  program  approximation  is  defined  by  specifying  a  col¬ 
lection  of  approximate  values  in  the  place  of  the  exact  values.  This  induces 
an  approximate  semantic  function,  and  the  program  approximation  is  typ¬ 
ically  the  least  fixed  point  of  this  function.  Algorithms  for  computing  such 
approximations  usually  take  the  form  of  some  kind  of  iterative  fixed  point 
computation.  Importantly,  the  approximate  values  are  chosen  in  such  a  way 
that  such  an  iterative  fixed  point  computation  is  guaranteed  to  terminate. 

In  logic  programs,  this  approach  has  been  widely  used  in  type  inference 
[38, 40, 67],  sharing  analysis  [27, 51],  instantiation  analysis  [45]  and  in  various 
combinations  of  these  analyses  [11,  15].  General  frameworks  for  abstract 
interpretation  of  logic  programs  have  also  been  developed  by  Bruynooghe 
[10]  and  also  by  Marriot,  Sondergaard  and  Jones  [44,  60]. 

Similarly,  the  idea  of  using  a  collection  of  approximate  values  to  reason 
about  programs  appears,  often  somewhat  implicitly,  in  most  of  the  work  on 
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analysis  of  imperative  programs.  The  use  of  approximate  values  is  made 
more  explicit  in  a  number  of  the  more  formal  accounts  of  this  approach. 
Early  works  in  this  area  include  papers  by  Sintzoif  [58],  Kildall  [39]  and 
Wegbreit  [66].  These  ideas  were  further  developed  by  Cousot  and  Cousot 
[13,  14]. 

In  functional  programming,  abstract  interpretation  has  been  used  for  a 
variety  of  analyses  including  strictness  analysis,  sharing  analysis,  and  bind¬ 
ing  time  analysis  (see,  for  example,  the  collection  of  papers  [1]).  There  are 
also  connections  with  type  systems,  particularly  those  involving  subtypes. 
In  such  systems,  the  starting  point  is  some  given  finite  set  of  base  types. 
This  set  is  then  closed  under  a  small  finite  number  of  type  constructors, 
usually  including  the  arrow  type  constructor.  The  use  of  a  finite  predefined 
set  of  base  types,  and  the  need  to  avoid  infinite  ascending  chains  of  types, 
is  in  spirit  similar  to  the  use  of  approximate  values  in  abstract  interpreta¬ 
tion.  In  fact  a  number  of  type  inference  problems  can  be  usefully  viewed  as 
abstract  interpretation  (for  example,  the  refinement  types  of  FVeeman  and 
Pfenning  [17]). 

Although  there  is  much  diversity  in  the  above  works  for  logic,  impera¬ 
tive  and  functional  languages,  one  unifying  factor  is  the  use  of  a  collection 
of  approximate  values  that  is  finite  or,  more  generally,  satisfies  some  kind  of 
finite  ascending  chain  condition.  Moreover,  this  condition  is  used  to  guar¬ 
antee  termination  of  the  analysis. 

In  terms  of  accuracy,  the  program  approximations  defined  by  this  kind 
of  approach  are  not  directly  comparable  to  set  based  approximations.  For 
some  programs,  abstract  interpretation  is  more  accurate.  As  an  example,  if 
an  imperative  or  logic  program  contains  only  constants  (function  symbols  of 
arity  0),  then  an  abstract  interpretation  approach  can  be  used  to  compute 
the  exact  collecting  semantics.  This  is  because  the  domain  of  values  is 
finite,  and  so  the  collection  of  abstract  value  can  be  chosen  to  be  all  possible 
sets  of  program  values.  However,  the  set  based  approximation  of  such  a 
program  is,  in  general,  an  approximation  to  its  meaning.  Conversely,  there 
exist  classes  of  programs  for  which  sbop  and  CSp  coincide,  but  such  that  no 
abstract  interpretation  algorithm  can  compute  CSp  for  each  program  in  the 
class.  For  example,  consider  a  program  in  which  all  function  symbols  (and  if 
applicable,  predicate  symbols)  have  arity  0  or  1.  For  such  programs  sbop  = 
CSp.  However  no  abstract  interpretation  based  approach  can  be  used  to 
compute  exactly  CSp  for  each  program  in  this  class.  This  is  essentially 
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because  any  regular  language  is  cp^nable  by  a  monadic  program.  Hence, 
if  an  abstract  interpretation  is  to  be  exact  on  all  monadic  programs,  the 
collection  of  abstract  values  used  must  be  expressive  enough  to  represent  all 
regular  languages,  and  this  leads  to  termination  problems.  A  more  detailed 
account  of  this  argument  is  presented  in  Appendix  11. 

In  summary,  the  program  approximations  that  arise  in  abstract  interpre¬ 
tation  work  can  capture  some  information  about  inter-variable  dependen¬ 
cies,  but  they  must  embody  other  forms  of  approximation  (to  ensure  that 
the  iterative  fixed  point  computation  terminates).  On  the  other  hand,  set 
based  approximations  ignore  all  information  about  inter-variable  dependen¬ 
cies,  but  make  no  other  approximations. 

For  efficiency  reasons,  many  abstract  interpretation  approaches  ignore 
all  inter-variable  dependencies.  In  this  case,  set  based  analysis  is  more  ac¬ 
curate  than  abstract  interpretation.  Moreover,  abstract  interpretation  that 
ignores  inter-variable  dependencies  can  be  used  to  provide  an  alternative 
definition  of  set  based  analysis  as  follows.  Consider  the  class  of  all  ab¬ 
stract  interpretations  that  ignore  inter-variable  dependencies.  In  essence, 
each  abstract  interpretation  in  this  class  is  defined  by  an  abstract  domain 
that  consists  of  a  collection  of  descriptions  for  sets  of  program  values.  In 
some  sense,  the  choice  of  abstract  domain  in  each  case  is  somewhat  ad  hoc, 
since  the  accuracy  of  the  analysis  can  always  be  improved  by  adding  more 
descriptions  to  the  abstract  domain.  In  contrast,  set  based  analysis  is  opti¬ 
mal  in  the  sense  that  it  corresponds  to  an  inter-variable  dependencies  free 
abstract  interpretation  over  the  abstract  domain  consisting  of  all  possible 
sets  of  program  values.  Clearly,  traditional  iterative  fixed  point  techniques 
cannot  be  applied  to  compute  over  this  domain,  and  hence  the  set  based 
analysis  algorithm  must  use  very  different  techniques.  We  note  that,  given 
a  program  P,  the  output  of  the  set  based  analysis  algorithm  essentially  de¬ 
fines  a  finite  domain  that  is  optimal  for  the  inter-variable  dependency  free 
abstract  interpretation  of  P.  In  other  words,  for  every  program  P,  the  set 
based  analysis  algorithm  synthesizes  a  finite  abstract  domain  Vp  that  is  at 
least  as  good  as  any  (finite  or  infinite)  abstract  domain  for  analyzing  P,  and 
moreover  corresponds  to  the  set  based  analysis  of  P.  The  optimal  domain 
T>P  is  clearly  different  for  different  programs. 
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?(/(<*>*))•  'Tp  Q  P(fia,b))i!}pifib,a)) 

p{f{b,a)).  T,  C  qiX) 

qiX)^pifiX,X)).  piX)  C  Tp 

Figure  5.8:  Program  11  and  Its  Set  Formula 
Non  Abstract  Interpretation  Approaches 

Instead  of  using  a  collection  of  approximate  values  to  obtain  a  decidable  pro¬ 
gram  approximation,  these  approaches  approximate  inter-variable  or  inter¬ 
argument  dependencies.  They  are  usually  based  on  the  use  of  set  formulas 
(constructed  from  a  program)  or  closure  operators.  Since  there  are  dose  con¬ 
nections  between  these  approaches  and  set  based  analysis,  we  shall  consider 
this  part  of  the  literature  in  considerable  detail. 

We  begin  by  discussing  approaches  to  the  approximation  of  logic  pro¬ 
grams.  One  of  the  early  works  in  this  area  was  by  Mishra  [48]  and  involved 
the  use  of  set  formulas  to  approximate  a  program’s  success  set.  Spedhcally, 
a  set  formula  was  constructed  from  a  given  program  P,  and  it  was  shown 
that  the  greatest  model  of  this  formula  was  a  superset  of  the  success  set  of 
P.  We  now  illustrate  the  construction  of  these  formulas.  Figure  5.8  contains 
Program  11  and  its  corresponding  set  formula.  In  this  formula,  the  vairiables 
Tp  and  T,  are  intended  to  be  the  subsets  of  atoms  in  the  success  set  of  Pro¬ 
gram  11  that  involve  the  predicates  p  and  q  respectively.  The  variable  X 
denotes  a  set  of  values  and  is  intended  to  capture  the  values  of  the  program 
variable  X.  The  operator  ^  is  similar  to  set  union  except  that  it  performs  a 
tuple  dosure  and  this  serves  to  ignore  inter-su'gument  dependences.  For  ex¬ 
ample,  {/(o,l»)}y  {/(^,a)}  is  defined  to  be  {/(a,a),/(o,6),/(6,o),/(6,6)}. 
More  formally,  if  Si  and  S2  are  two  sets  of  values,  then  Si  W  S2  is  (5i  U  S2)* 
where  ★  is  defined  to  be  the  following  dosure  operator: 

5*  **=  {c  :  c  is  a  constant  in  5}  U  U 

/eE 

where  /(5i,...,5„)  denotes  the  set  {/(«i,...,«n)  =  s,-  €  and  f^^{S) 
denotes  the  set  {s,- :  /(si,.. .  ,Sn)  €  5}.  An  interpretation  J  of  a  set  formida 
is  defined  by  specifying  a  set  of  ground  atoms  for  each  variable  of  the  form 
Tp,  and  a  set  of  values  for  each  program  variable.  Interpretations  are  ordered 
pointwise  as  follow:  I  D  I'  if,  for  each  variable,  the  set  specified  by  2  is 
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larger  than  or  equal  to  the  one  specified  by  I'.  An  interpretation  that 
satisfies  the  formula  is  called  a  model.  For  example,  the  greatest  model  of 
the  set  formula  in  Figure  5.8  is 

Tp  ^  {?(/(a,a))>P(/(a,^)).p(/(6,a)),p(/(&»i))} 

T,  {q{a),q{b)} 

X  ^  {a,  5} 

Using  the  same  kind  of  approximation,  Yardeni  and  Shapiro  [68]  defined 
a  notion  of  program  approximation  based  on  the  immediate  consequence  op¬ 
erator  Tp.  Recall  from  Section  4.2,  page  63,  that  the  immediate  consequence 
operator  Tp  is  defined  by 

'rtc\  f  A  \  .  Ao*-Ai,...,A„  is  arule  in  P  \ 

T,(S)  =  „  I 

and  that  the  least  fixed  point  of  this  function,  denoted  lfp{Tf)  can  be  used 
to  defined  a  semantics  of  logic  programs.  Yardeni  and  Shapiro  modified 
this  definition  using  the  ★  operator.  Specifically,  an  approximate  immediate 
consequence  operator  Yp  was  defined  by  Yp(S)  =  {Tp{S))*.  This  gives  rise 
to  an  approximate  program  semantics  lfp(Yp).  Clearly  YpiS)  D  Tp(S)  for 
each  5,  and  it  follows  that  lfp{Yp)  is  a  conservative  approximation  of  P  in 
the  sense  that  lfp(Yp)  D  lfp{Tp). 

In  [25],  Heintze  and  Jaffar  showed  that  the  approximations  defined  by 
the  set  formulas  of  Mishra  and  the  Yp  operator  of  Yardeni  and  Shapiro 
are  very  closely  related.  In  essence,  the  greatest  model  of  the  set  formiilas 
corresponds  to  the  greatest  fixed  point  of  Yp.  [25]  also  shows  how  the  set 
formulas  can  be  re-engineered  so  that  the  correspondence  becomes  exact. 
Specifically,  it  is  shown  how,  from  a  program  P,  set  formulas  can  be  con¬ 
structed  such  that  a  model  of  the  set  formulas  is  a  fixed  point  of  Yp  and 
vice-versa.  (Strictly  speaking,  this  correspondence  may  not  be  exact  for  a 
small  class  of  degenerate  programs.) 

In  summary,  the  set  formulas  and  the  Yp  operator  both  approximate 
programs  by  ignoring  dependencies  between  arguments  of  function  symbols. 
The  advantage  of  using  set  formulas  is  that  they  provide  a  natural  starting 
point  for  development  of  algorithms.  Partial  algorithms  were  reported  in 
[48].  The  advantage  of  using  Tp-like  operators  is  that  they  provide  a  closer 
connection  with  standard  notions  of  logic  program  semantics,  and  hence  give 
better  insight  into  the  nature  of  the  approximation. 


136 


CHAPTER  5.  SET  BASED  APPROXIMATION 


The  notion  of  closure  embodied  in  ★  is  in  some  sense  very  extreme;  it 
forces  all  dependencies  to  be  ignored.  Moreover,  it  behaves  in  an  unbounded 
manner  (see  the  recursive  case  in  the  definition  of  ★),  and  this  appears  to  in¬ 
troduce  substantial  complexity.  It  has  yet  to  be  shown  that  apprcodmations 
using  *,  such  as  lfp(Yp),  are  decidable.  One  reason  for  the  extra  complexity 
is  that  the  usual  distributive  laws  do  not  hold  when  U  is  replaced  by  li). 
That  is,  (5i  W  52)  n  S3  does  not  in  general  equal  (5i  n  S3)  W  (52  n  53). 


In  [21],  Heintze  and  Jaifar  proposed  a  more  accurate  approximate  conse¬ 
quence  operator,  Tp.  Instead  of  ignoring  all  dependencies,  Tp  approximates 
programs  by  ignoring  inter-variable  dependencies.  In  essence,  this  work  was 
an  early  version  of  the  bottom-up  set  based  analysis  of  logic  programs.  To 
define  Tp,  recall  the  definition  of  A,  which  maps  a  collection  of  environments 
6  into  the  smallest  set  environment  that  contains  0.  Specifically,  .4(6)  is 
the  set  environment  that  maps  each  variable  X  into  {^(Ar) :  0  6  0}.  Now, 
we  have  already  observed  how  set  environments  can  be  treated  as  mappings 
from  terms  into  sets  of  values.  For  example,  if  p  is  the  set  environment 
{[Art-^{a,c},yH->{6,d}]}  then  Q(f(X,Y))  denotes  the  set  {/(a,6),  f{a,d), 
f{c,b),  f{c,d)).  Using  these  definitions,  Tp  can  be  defined  as  follows: 


7V(/) 


def 


|a  €  e(4o) : 


4o*~4i, . . 

q  =  a{{p 


.,An  €  P,  and 

'p{M)  €  p(4„)  €  / 


where  /  is  a  set  of  ground  atoms.  Intuitively,  Tp  fiist  collects  together  the 
environments  for  a  rule  that  instantiate  the  body  atoms  into  elements  in 
I.  Using  this  set  of  environments,  it  collects  together  the  possible  values 
that  each  variable  may  be  instantiated  to,  ignoring  relationships  between 
these  variables.  A  set  environment  is  then  defined  as  the  mapping  from 
each  variable  into  the  corresponding  collected  set  of  values,  and  finally,  this 
set  environment  is  applied  to  the  head  of  the  rule. 


To  illustrate  the  difference  between  Ip  and  Tp,  consider  Figure  5.9,  which 
gives  an  example  logic  program  and  the  corresponding  least  fixed  points  of 
Ip,  Tp  and  Tp.  For  simplicity,  only  the  subsets  of  atoms  with  predicate  q  are 
given;  we  abbreviate  these  sets  by  lfp{Yv)\q',  (/i>(Tp)|,  and  lfp{Tr)\q  respec¬ 
tively.  For  Ip,  the  approximation  is  at  the  level  of  arguments,  and  this  is 
reflected  by  the  presence  of  subterms  such  as  /(o,d)  and  /(c,6)  in  lfp{Yr)\q. 
In  contrast,  Ifp^Tp)  does  not  contain  such  elements  and  is  strictly  smaller 
than  lfp{Yf).  lfp{Tf)  does  however  contain  elements  that  do  not  appear  in 
//p(Tp).  This  relationship  holds  in  general.  That  is,  for  all  programs  P, 
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Vp(y^)\<, 

/KTp)|, 

q(f(a,  b),  a) 
q(f(a,  d),  a) 

q(f(a,  b),  a) 

q(f(a,  b),  a) 

p(/(o,6),a). 

PU{c,d),h). 

q(f(c,  d),  a) 
q(f(c,  b),  a) 

q(f(c,  d),  a) 

giX,Y)^piX,Y). 

q(f(c,  d),  b) 
q(f(a,  d),  b) 
q(f(a,  b),  b) 
q(f(c,  b),  b) 

q(f(c,  d),  b) 

q(f(a,  b),  b) 

q(f(c,  d),  b) 

Figure  5.9:  Differences  between  Tp,  Tp  and  Yp 


Vp{Tp)  C  lfp(Tp)  C  lfp(Yp).  Tliis  just  reflects  the  fact  that  approximations 
defined  by  ignoring  inter- variable  dependencies  are  more  accurate  than  those 
defined  by  ignoring  all  dependencies. 


In  [25],  Heintze  and  JafFar  defined  a  further  operator  whose  accuracy  is 
between  that  of  Tp  and  Yp.  This  operator,  called  Zp,  differs  from  Tp  in  that  it 
treats  each  variable  occurrence  separate,  y.  Specifically,  an  occurrence  based 
environment  is  a  mapping  from  pairs  (X,  i)  into  values,  where  X  is  a  variable 
and  t  is  an  integer  indicating  a  variable  occurrence.  We  adopt  the  convention 
that  occurrences  of  variables  are  labeled  left  to  right.  For  example,  applying 
(•^»2)h-*6,  (J^,3)^->c,  (y,l)i-vd,  (y,2)»-»c},  to  the  sequence  of 
atoms  p(X),  g(X,Y)  yields  p(a),  g(b,d).  The  A  approximation  operator 
can  now  be  adapted  to  map  collections  of  occurrence  based  environments 
into  a  set  environment  as  follows;  .4(0)  maps  each  variable  X  into 


{w  :  for  all  t,  there  exists  p  €  0  s.t.  p(X,  i)  =  u} 

The  operator  Zp  can  now  be  defined  similarly  to  Tp,  except  that  environ¬ 
ments  are  replaced  by  occurrence- based  environments. 


Zp(I) 


^  |a€p(4o); 


Ao*—Ai,...,An  e  P,  and 
A({p  :  p(Ai)  e  /,  1  <  t  <  n}) 


) 


The  variable  g  ranges  over  set  environments  and  p  ranges  over  occurrence 
based  environments.  In  general  Zp  is  less  accurate  than  Tp  (because  it 
ignores  dependencies  between  different  occurrences  of  the  same  variable) 
but  more  accurate  than  Yp  (because  it  does  not  ignore  all  inter-argument 
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r(X)^p(f(X,X)). 

s(fiY,Z))^p(f(Y,Z)). 


IfpiTr) 

\ 

Pifia^b)),  Pi  fib,  a)),  1 
.  ■*(/(<*»  6)),  s(/(6,a))  J 

IfpiTp) 

: 

Pifi<t^b)),  p(/(6,o)),  1 

[  s(/(a,a)),  sifia,b)),  sifib,a)),  s(/(6,6))  / 

IfpiZp) 

< 

P(/(o»“))»  Pifia,b)), 
r(a),  r(6) 

,  sifia,a)),  sifia,b)),  s(/(5,a)),  s(/(6,6)) 

- 

lfp(Yp) 

< 

Pifia,a)),  p(/(a,6)),  pi  fib, a)),  p(/(6,6)) 

r(a),  rib) 

^  sifia,a)),  sifia,b)),  si  fib,  a)),  s(/(5,5)) 

► 

Figure  5.10:  Differences  between  Tp,  Tp,  2^  and  Yf 


dependencies).  Figure  5.10  illustrates  the  difference  between  Tp,  Tp,  Zp  and 
F’p. 


We  now  compare  Ip,  Zp  and  Tp  to  set  based  analysis.  As  has  already 
been  mentioned,  the  work  on  Tp  was  essentially  an  early  version  of  bottom- 
up  set  based  analysis  for  logic  programs.  Specifically,  given  a  logic  program 
P,  lfy>{Tp)  corresponds  exactly  to  the  least  set  based  model  of  the  bottom-up 
environment  constraints  of  P.  It  follows  that  set  based  analysis  is  strictly 
more  accurate  than  the  Yp  and  Zp  approaches  to  program  approximation. 

We  now  sketch  the  proof  of  the  equivalence  of  lfp{Tp)  and  the  least  set 
based  model  of  bottom-up  €Cp.  First,  the  two  approximations  must  be  put 
into  the  same  form,  since  Tp  defines  an  approximation  to  the  set  of  successful 
ground  goals  for  P,  whereas  the  least  set  model  of  bottom-up  €Cp  associates 
a  set  environment  to  each  program  rule.  However,  Tp  can  be  thought  of  as 
associating  collections  of  environments  to  each  program  rule  in  a  natural 
way.  To  see  how  this  may  be  done,  first  recall  the  definition  of  Tp. 
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MI) 


|a  €  ^(>lo)  : 


^  Aji  €  3iiid 

g  =  A{{p:  p{Ai)  e  I,  !<»<«}) 


Implicit  in  this  definition  is  the  computation  of  a  set  environment  corre¬ 
sponding  to  each  program  rule.  To  make  this  explicit,  consider  replacing 
the  set  I  of  ground  atoms,  with  a  set  of  pairs  (B,  g),  where  B  is  the  head  of 
a  program  rule  and  p  is  a  set  environment.  Then  Tp  can  be  defined  by 


MI)  = 


Ao*-Ai,...,An  e  P  and 

(Ao,A(e)):  e  =  K.^(^.)g  y  l<i<„| 


where  the  least  fixed  point  of  this  alternative  function  defines  a  collection  of 
pairs  (B,  g)  such  that  taking  the  union  of  the  sets  g(B)  over  all  these  pairs 
recovers  the  least  fixed  point  of  the  original  Tp  definition. 


The  least  fixed  point  of  this  definition  is  just  the  least  solution  of  the 
equation  Tp(/)  =  /,  and  so  the  least  fixed  point  can  be  characterized  as  the 
least  solution  of 


I  - 


UAo,A(e)): 


Ao*—Aif...,Ati  €  P  and 
0  =  |p:p(A)6  U  g(B),  l<i<n 


where  solutions  to  this  equation  are  ordered  as  follows:  Ii  <  I2  if,  for  all 
head  atoms  B,  (J3,pi)  €  h  and  (B,p2)  €  h  implies  that  pi  C  p2- 


Now,  a  solution  I  of  this  equation  is  a  specification  of  a  set  environment 
to  each  rule  in  P.  Using  the  fact  that  each  rule  in  a  logic  program  is 
assumed  to  have  a  unique  label,  I  can  be  viewed  as  a  spedfication  of  a  set 
of  elements  of  the  form  p®,  one  for  each  rule  R°  in  P.  Using  this  notion, 
the  single  equation  above  can  be  decomposed  into  a  collection  of  equations, 
one  for  each  rule  label  a  as  follows: 


g^  ~  A 


p  :  p{Ai)  €  U  gP{head{R)),  1  <  t  < 


(5.10) 


where  head{R)  denotes  the  head  of  the  rule  R.  Now,  consider  replacing 
equality  in  these  constraint  by  inequality  to  obtain  the  constraints: 
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D  A 


p  :  p{Ai)  €  IJ  ^{head{R)),  1  <  *  <  » 


!) 


(5.11) 


where  a  ranges  over  rule  labels  in  P.  Importantly,  it  can  be  shown  that  the 
least  model  of  (5.11)  is  the  same  as  the  least  model  of  (5.10). 


Now,  consider  the  expressions  p(Ai)  € 
components  of  Ua/>€P  Q^(he<ul(R))  that  may  contain  a  term  of  the  form 
p{Ai)  are  those  terms  that  match  >1,-.  Hence 


R^ePajiA  ] 

p{Ai)  €  U  0^(3^)  iff  piAi)  €  L)  <  Q^{head{R))  :  head{R)  and  A.  I 
R^€P  are  compatible  J 


It  follows  that  the  constraints  (5.11)  are  satisfied  if  and  only  if  the  following 
constraints  are  satisfied: 

D  {p:p(Ai)€p^‘(5i)-..,PU«)€p^(Bn)}  (5.12) 


where  the  0i  range  over  rule  labels  such  that  Bf'  is  a  head  atom  in  P  and 
Ai  and  H,-  are  compatible. 


Observe  that  the  constraints  (5.12)  are  virtually  identical  in  meaning 
to  the  set  based  interpretation  of  the  bottom-up  environment  constraints. 
In  fact  the  only  essential  difference  is  that  variables  are  used  instead  of 
Recalling  that  R^  £  P  indicates  that  the  rule  with  label  /?  in  P  is  R, 
it  is  now  easy  to  see  that  bottom-up  set  based  analysis  of  logic  programs 
corresponds  to  lfp{Tp)  in  the  following  sense: 


Proposition  15  Let  P  be  a  logic  program,  and  let  SC p  be  the  bottom-up 
environment  constraints  for  P.  Then 

=  U  sbap(^^){headiR))  Q 
R^&P 

We  conclude  by  discussing  the  use  of  non  abstract  interpretation  ap¬ 
proaches  to  the  analysis  of  imperative  and  functional  programs.  The  two 
main  works  here  are  by  Jones  and  Muchnick  [32,  34]  and  Reynolds  [63].  In¬ 
stead  of  developing  an  approximate  program  semantics  in  the  style  of  Ip  or 
Tp,  these  works  focus  on  the  use  set  constraints  to  obtain  information  about 
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the  possible  run-time  values  of  program  variables. 

In  [32],  an  analysis  is  described  for  an  imperative  language  with  LiSP-like 
data  structures  (this  work  was  later  generalized  in  [34]).  The  essence  of  this 
approach  is  the  construction  of  set  constraints  corresponding  to  a  program 
to  capture  the  flow  of  values  from  one  variable  to  another  as  the  program  is 
executed.  Underlying  this  work  is  the  intuition  of  treating  program  variables 
as  sets  of  values  and  this  is  inherited  by  set  based  analysis.  However,  the 
set  constraints  are  constructed  in  such  a  manner  that  they  can  be  solved 
by  a  fairly  straightforward  algorithm.  In  particular  the  set  constraints  do 
not  contain  a  notion  of  intersection,  and  their  only  operation  is  projection 
(corresponding  to  decomposition  of  data  structures).  Hence  they  are  not  ex¬ 
pressive  enough  to  capture  a  number  of  important  components  of  programs. 
For  example,  all  information  about  the  conditions  in  conditional  statements 
is  completely  omitted.  Also  information  relating  to  well  deflnedness  of  ex¬ 
pressions  is  ignored  (for  example,  after  a  statement  X  =  car(y’),  it  must 
be  the  case  that  Y  is  of  the  form  cons(-  •  •)  because  otherwise  the  program 
would  have  terminated  with  an  error). 

In  contrast,  the  earlier  paper  [63]  uses  set  constraints  to  compute  data 
type  definitions  for  program  variables  in  a  first  order  functional  language. 
The  constraints  used  are  similar  to  those  used  in  [32].  Again  the  only  set  op¬ 
eration  of  the  constraints  is  projection,  and  so  the  program  approximations 
obtained  are  considerably  less  accurate  than  set  based  approximations. 

In  summary,  the  set  constraints  used  in  [32, 63]  are  substantially  simpler 
than  those  used  in  set  based  analysis.  This  has  the  advantage  that  they 
are  much  easier  to  solve.  However,  the  program  approximations  that  they 
define  are  significantly  less  accurate  than  set  based  approximations.  Another 
major  difference  is  that  these  approaches  have  viewed  constraints  as  a  tool 
for  obtaining  information  about  the  program,  and  the  constraints  themselves 
incorporate  a  number  of  ad  hoc  approximations  in  addition  to  ignoring  inter¬ 
variable  dependencies. 

The  general  approach  of  [32,  63]  has  been  extended  by  Jones  [31]  to 
deal  with  higher  order  functions.  This  approach  has  been  further  developed 
for  binding  time  analysis  [50],  garbage  collection  [30]  and  globalization  of 
function  parameters  [56].  One  presentational  difference  in  these  works  is 
the  use  of  various  extensions  of  regular  grammars  instead  of  constraints. 
However  there  is  a  strong  duality  between  such  grammars  and  set  constrmnts 
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since  there  is  a  natural  way  to  view  set  constrsdnts  as  grammars  and  vice- 
versa.  In  particular,  the  technical  details  are  broadly  similar  to  [31,  32,  63]. 


Chapter  6 


Set  Constraints 


Previous  chapters  have  described  how  environment  constrsdnts  can  be  used 
to  characterize  the  run-time  behavior  of  programs.  Subsequently,  set  based 
program  approximation  was  defined  by  treating  program  variables  as  sets  of 
values.  As  a  first  step  towards  computing  set  based  approximations,  we  now 
show  how  environment  constraints  may  be  translated  into  set  constraints 
such  that  the  least  set  based  model  of  the  environment  constraints  corre¬ 
sponds  to  the  least  model  of  the  set  constraints.  This  translation  uses  the 
fact  that,  when  interpreted  using  set  based  interpretations,  certain  aspects 
of  environment  constraints  may  be  significantly  simplified. 

We  first  describe  the  general  form  of  the  set  constraints  used  and  prove 
some  basic  properties.  Then,  for  each  kind  of  environment  constraint,  we 
give  the  translation  into  set  constraints  and  show  that  is  correct.  In  effect, 
this  translation  reduces  the  problem  of  computing  the  set  based  approxi¬ 
mation  of  a  program  into  the  problem  of  computing  the  least  model  of  a 
collection  of  set  constraints. 
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6.1  Set  Constraints 


We  define  a  scheme  for  set  based  calculi.  This  scheme  is  defined  in  the 
context  of  some  alphabet  S  of  function  symbols  where  each  function  symbol 
/  comes  equipped  with  a  unique  arity  denoted  arity{f).  The  letters  /,  g  and 
h  shall  be  used  to  denote  function  symbols.  A  function  symbol  of  arity  0 
is  called  a  constant.  A  value  is  an  expression  constructed  from  the  function 
symbols  in  S,  viz:  /(vi, . . . ,  VAritt{f))  ^  value  if  each  v,-  is  a  value.  We  shall 
assume  a  countably  infinite  collection  VAH  of  set  variables.  Set  variables 
shall  be  denoted  X,y,2,  etc.,  and  shall  be  interpreted  as  sets  of  values. 

The  main  parameter  of  the  calculus  scheme  is  a  collection  of  operations 
for  combining  sets  of  values.  SpedficaUy  let  OP  be  a  collection  of  set  op¬ 
erations,  where  each  operation  op  €  OP  has  an  associated  ajity  denoted 
arity(op),  as  well  as  a  meaning  function  \op\,  which  maps  any  sequence  of 
sets  of  values  (5i, . . . ,  5art(y(op))  into  a  set  of  values.  In  the  context  of  such 
a  collection  of  operations,  define  that  a  set  expressions  se  is  either: 

•  a  set  variable; 

•  one  of  the  special  constants  T  or  ±; 

•  sej  U  se2’, 

•  •  •  •  >  ^€n)  where  /  is  an  n-ary  symbol  from  E,  or 

•  op{se\,.. . ,3Cn)  where  op  is  an  n-ary  operation  from  OP, 

where  the  se,-  are  set  expressions.  Note  that  the  constant  T  also  appears 
in  environment  constraints.  However,  it  will  always  be  clear  from  context 
whether  T  represents  an  environment  expression  or  a  set  expression.  A 
set  constraint  is  of  the  form  sci  D  se2  or  sei  =  se2.  Collections  of  set 
constraints  shall  be  denoted  by  the  symbol  C.  Where  e^i,...,ezp„,  n  > 
1,  is  a  sequence  of  set  expressions  or  set  constraints,  var{ezpi, . . . ,  exp,^) 
denotes  the  collection  of  all  set  variables  that  appear  in  exp-^,. . . ,  ezp„. 

To  define  the  meaning  of  set  expressions,  let  J  be  a  mapping  from  each 
set  variable  into  sets  of  values.  Then  X{se)  is  defined  to  be: 


•  T{X)  if  A  is  a  set  variable; 
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•  the  set  of  all  values,  if  se  is  T; 

•  the  empty  set  {},  if  se  is  ±; 

•  I(sei)  U  1(362),  if  se  is  sei  U  362; 

•  {/(«!,...,«„)  :t)i  €l(«ei)}‘f^«is/(sci,...,se„),  or 

•  [op](I(5Ci), . . .  ,I(sen))  if  se  is  op(5ei, . . . ,  scn). 

I  is  a  model  of  a  constraint  sci  D  362  or  sei  =  se2  if  2'(sci)  3  1(362)  or 
X(sei)  =  1(362)  respectively.  I  is  a  model  of  a  collection  C  of  constraints, 
denoted  7  f=  C,  if  J  is  a  model  of  each  constraint  in  the  collection.  Models 
shall  be  ordered  componentwise:  D  I2  iff  ^i(^)  5  ^2(^)  for  each  set 

variable  X,  We  write  lm(C)  to  denote  the  least  model  of  C  if  it  exists. 

Note  that  the  meaning  ^(ee)  of  a  set  expression  ee  is  defined  in  terms 
of  the  mesmings  of  the  immediate  subexpressions  of  ee,  and  it  follows  that 
“equal”  terms  can  be  replaced  in  all  contexts.  Specifically, 

Proposition  16  Let  se  be  a  set  expression  that  contains  sei  os  a  subex¬ 
pression.  Let  se'  be  the  result  of  replacing  sei  se  by  the  set  expression 
362.  IfT(sei)  =  X(se2)  then  I(se)  =  X(se'). 

Proof:  By  structural  induction  on  se.  [] 

We  now  give  some  example  constraints.  Let  C  denote  the  single  con¬ 
straint  X  D  c  [J  f(f(X)),  where  c  is  a  constant  and  /  is  a  unary  symbol. 
C  has  many  models,  including  the  interpretation  that  maps  all  set  variables 
into  the  set  {c, /(c), /(/(c)),...}.  Another  model  of  C  is  the  interpretation 
X  defined  by 

\  {}  ff  ^  different  from  X 

where  /"  abbreviates  n  applications  of  /.  This  model  is  smaller  than  the 
first,  and  is  in  fact  the  least  model  of  C. 

Without  using  operators  from  OV,  the  constraints  that  can  be  formed 
have  a  simple  structure.  In  particular,  it  is  fsurly  easy  to  reason  about  their 
least  model  because  the  structure  of  the  least  model  is  readily  appau'ent 
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from  the  constraints.  Specifically,  if  C  is  a  collection  of  constraints  involving 
only  set  variables,  ±,  T,  union  and  function  symbols,  then  there  is  a  simple 
polynomial  time  algorithm  for  determining  v  €  /7n(C)(A')  where  t;  is  a  value 
and  A'  is  a  set  variable.  We  shall  discuss  this  further  in  the  next  chapter, 
and  also  show  that  such  collections  C  correspond  to  regular  tree  grammars, 
or  alternatively,  regular  tree  automaton. 

The  most  interesting  aspect  of  set  constraints  lies  in  the  set  operators 
that  make  up  the  parameter  OV.  One  of  the  simplest  set  operators  is 
intersection,  which  is  given  its  usual  set  theoretic  interpretation  so  that  if 
I(A’)  =  {a,6,c}  and  T{y)  =  {b,c,d}  then  I(A'ny)  =  {6,c}.  As  an  example 
of  the  use  of  intersection,  consider  the  following  constraints: 

X  D  aUp(X) 

y  2  auf(y) 

z  D  xoy 

The  least  model  of  these  constraints  maps  X  into  {a,/^(o),/®(a),. . .},  maps 
y  into  {a,P{a)yf\a),.. .},  maps  Z  into  {a,/®(a),/'^(o),. . .},  and  maps  all 
other  variables  into  {}.  For  convenience,  we  shall  consider  n-ary  intersec¬ 
tions,  written  Hn  where  n  >  2.  The  subscript  n  shall  usually  be  omitted 
and  an  expression  n„(5ei, . . . , scn)  shall  be  written  as  sci  n  •  •  •  n  se^. 

Another  kind  of  operator  is  projection.  Specifically,  for  each  n-ary  sym¬ 
bol  /  €  S,  there  are  n  projection  operators,  /(t;.  The  meaning 

of  each  operator  is  defined  by  the  function  [/(7)*1>  which  maps  a  set  5 
of  value  into  {»,•  :  /(ui,..  .,Wti)  €  5}.  For  example,  consider  the  following 
constraint: 

^  2  m^))  U  a  U  f^^]{X) 

The  least  model  of  this  constraint  maps  X  into  {o,/(o),/(/(a)),. ..}  and 
maps  all  other  variables  into  the  empty  set. 

The  next  class  of  operators  have  arity  0.  Their  purpose  is  to  allow  a 
very  limited  form  of  complementation  to  be  expressed.  Specifically,  where 
se  is  a  ground  set  expression,  define  that  3?  is  a  complement  constant 
that  represents  the  complement  of  se.  That  is,  for  any  interpretation  I, 
X{s€)  =  {v  :  V  ^  J(se)}.  Since  se  is  ground,  complement  constants  se  have 
a  fixed  meaning  over  all  interpretations.  We  choose  not  to  introduce  com¬ 
plementation  in  its  full  generality  because  it  is  not  monotonic.  The  limited 
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form  of  complementation  embodied  in  the  complement  constants  proves  to 
be  useful  for  reasoning  about  certain  aspects  of  imperative  programs  such 
as  inequality  and  negated  match  conditions. 

Although  the  ground  set  expression  se  in  a  complement  constant  se  may 
be  arbitrary,  we  shall  not  use  this  generality  in  the  constraints  employed 
to  compute  sbop.  Specifically,  we  shall  only  use  complement  constants  that 
have  the  form  ci  U  •  •  •  U  a„  where  each  a,-  is  either  of  the  form  /i(T, . . . ,  T) 
or  contains  only  function  symbols.  For  notational  convenience,  we  shall  fre¬ 
quently  write  complement  constants  in  the  form  S  where  5  is  a  set  of  ground 
set  expressions,  so  that  if  5  is  {oi, . . . , On}  then  ^  denotes  oi  U  •••  U  o„.  For 
example,  {n*7,  cotis(T,  T)}  denotes  nil  U  cofw(T,  T),  which  describes  the  set 
of  values  whose  top-mosi,  symbol  is  not  nil  or  cons.  We  shall  identify  the 
constant  T  with  {  }. 

Note  that  if  S  is  finite,  then  complement  constraints  S  can  be  consid¬ 
ered  to  be  a  notation  for  a  somewhat  unwieldy  ground  set  expression.  For 
example,  if  £  is  {f,g,a},  then  /(a)  can  be  identified  with  the  expression 
5(T)  U  a  U  /(a)  where  a  denotes  /(T)  U  ^(T).  We  choose  to  use  explicit 
complement  constants  because  it  gives  a  slightly  more  general  treatment, 
and  leads  to  more  efficient  algorithms. 

The  final  class  of  set  operators  used  in  this  thesis  are  quantified  operators. 

In  essence,  a  quantified  operator  of  arity  n  is  a  formulas  with  n  holes.  We 
begin  by  defining  these  formulas.  A  quantified  set  expression  is  of  the  form 
{X  :  conj}  where  A  is  a  program  variable  and  conj  is  a  conjunction  of 
quantified  conditions  of  the  form  s  €  'Se  or  s  f  se  where  s  is  a  program  term 
and  se  is  a  set  expression.  If  conj  is  the  empty  conjunction,  then  {X  :  conj} 
is  identified  with  T.  Now,  a  quantified  operator  is  essentially  a  quantified 
set  expression  with  the  set  expressions  missing.  Specifically,  a  quantified 
operator  op  consists  of  a  program  variable  X  and  a  sequence  of  m  >  0 
formulas,  each  of  the  form  (s  €  •)  or  (s  f  ')•  The  result  of  applying  op  to  a 
sequence  of  set  expressions  sei, . .  .,sem  is 

{X  :  conji  A  •  •  •  A  conjm} 

where,  for  i  =  l..m,  conji  is  (s  €  se;)  if  the  formula  in  qp  is  (s  €  •),  and 
conji  is  (s  t  sci)  if  the  formula  in  qp  is  (s  f  •).  For  example,  if  op  consists 
of  the  program  variable  X  and  the  sequence  of  formulas  {/(X)  €  '),iX  f  •) 
then  opifiz),y)  is  {X : /(x) €  /(z) AX  ey}. 
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The  meaning  of  quantified  operators  is  defined  by  giving  a  meaning  to 
quantified  set  expressions.  Let  I  be  an  interpretation  that  maps  each  set 
variable  into  a  set  of  values.  Then  J({X  :  conj})  is  defined  to  be  {p{X)  : 
p  €  I(conj)}  where  p  €  I(conj)  if 

p  >  s  A  p(s)  €  I(se)  for  all  s  6  se  in  conj, 

and  p  t>  s  A  3v(v  p(s)  A  t>  €  T(se))  for  all  s  f  se  in  conj. 

Intuitively,  p  €  X{s  €  se)  if  p(s)  is  contained  in  the  set  of  values  that  I 

assigns  se,  and  p  €  X(s  f  se)  if  J(se)  contains  a  value  different  from  p{s). 

For  example,  consider  the  following  constraints: 

X  2  {JT;  Xi2  A  xeTfr)  A  Xey} 
y  D  aUiUcU  /(a) 

Z  D  c 

The  least  model  of  these  constraints  maps  Z  into  {c},  y  into  {a,6,c,/(a)} 
and  X  into  {a,  6}.  Note  that  if  the  constraint  Z  D  ais  added,  then  the  least 
model  changes  and  X  is  now  assigned  {a,b,c}.  This  is  because  the  least 
model  now  maps  Z  into  {a,c},  and  so  the  disjointness  condition  X  j  Z  is 
becomes  vacuous. 

Intuitively,  quantified  conditions  of  the  form  s  €  se  are  elementhood  re¬ 
lationships,  and  aire  generated  from  atomic  program  conditions  of  the  form 
s  =  t,  matchf{s)  and  -<matchf(s).  Quantified  conditions  of  the  form  s  f  se 
are  "apartness”  relationships  and  are  generated  from  atonaic  program  con¬ 
ditions  of  the  form  -<(s  =  t).  For  example,  consider  the  program  condition 
X  ^  y.  In  essence,  this  shall  be  translated  to 

(X  e  A-)  A  (A- 1 3^)  A  (y  €  y)  A  (y  t 

where  X  and  y  respectively  denote  the  sets  of  values  for  X  and  Y  at  the 
point  just  before  execution  of  the  conditional  statement.  The  idea  is  that 
the  two  apartness  conditions  capture  the  requirement  that  X  and  Y  assume 
distinct  values.  For  example,  if  X  is  {a,  6}  and  y  is  {&},  then  the  only 
possible  values  for  X  and  Y  after  the  conditional  are  a  and  b  respectively. 

To  summarize,  the  set  operators  used  to  compute  sbop  include  intersec¬ 
tion,  projections,  complement  constants  and  quantified  operators.  We  note 
that  quantified  operators  may  be  viewed  as  a  generalization  of  projections 
because  any  expression  f^^ise)  can  be  translated  into  the  quamtified  expres- 
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sion  {X,- :  /(Xi, . . .,Xn)  €  se)  where  / is  n-ary  and  Xi, . . . ,X„  are  distinct 
program  variables. 


Central  to  our  work  on  set  constraints  is  the  notion  of  least  model.  In 
general,  these  models  do  not  exist.  For  example,  the  constraint  X  \jy  =  a 
has  two  minimal  models  as  follows 


(  {a)  iSZisX 

I  {}  Otherwise 


UZ) 


(  {a}  if  Z  is  y 
1  {}  otherwise 


but  does  not  have  a  least  model.  However,  in  certain  circumstamces,  a  least 
model  is  guaranteed  to  exist.  Specifically,  define  that  a  constraint  is  in 
variable-expression  form  if  is  has  the  form  X  D  se  where  A"  is  a  variable 
and  se  is  a  set  expression.  All  of  the  constraints  used  in  this  thesis  shall 
have  this  form.  Now,  Corollary  4  of  the  Appendix  I  shows  that  the  least 
model  of  a  collection  of  variable-expression  form  constraints  exists  if  the  set 
operators  appearing  in  the  constraints  are  monotonic  in  aU  arguments.  It  is 
easy  to  verify  that  all  of  the  operators  we  have  introduced  are  monotonic, 
and  hence  the  collections  of  constraints  we  shall  consider  will  always  have 
least  models. 


Any  constraint  of  the  form  X  D  se  shall  be  referred  to  as  a  lower  bound 
for  the  set  variable  X  because  in  any  model  of  this  constraint  X  contains 
at  least  se.  The  following  two  propositions  establish  some  useful  proper¬ 
ties  about  lower  bounds;  both  propositions  are  instances  of  more  general 
propositions  that  can  be  found  in  Appendix  I  (see  propositions  48  and  49 
respectively). 


Proposition  17  Let  C  be  a  collection  of  constraints  in  variable-expression 
form.  If  V  ^  lm(C){X)  then  C  contains  a  lower  bound  on  X  of  the  form 
X  D  se  such  that  v  €  /m(C)(sc).  [] 

Proposition  18  Let  C  be  a  collection  of  constraints  in  variable-expression 
form  in  which  all  set  operators  are  monotonic.  If  X  D  se  is  the  only  lower 
bound  for  X  then  lm(C){X)  =  lm{C){se).  [] 

This  last  proposition  implies  that  if  Xi,...,Xn  is  a  sequence  of  distinct 
variables,  then  any  collection  of  constraints  of  the  form 

X\  D  SCj , .  •  • )  Xn  5 
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has  the  same  least  model  as  the  constraints  Ai  =  sei,...,A’„  =  se„.  Now, 
consider  an  arbitrary  collection  of  variable-expression  form  constraints  and 
suppose  that  the  following  step  is  repeatedly  applied  to  the  coUection:  re¬ 
placing  two  constraints  X  D  se  and  X  D  se'  by  the  single  constraint 
X  D  seU  se'.  The  result  is  an  equivalent  collection  of  constraints  of  the 
form  X\  D  sei,...,^n  2  where  X\,...,Xn  are  distinct  variables.  It 
follows  that  variable-expression  form  collections  of  constraints  are  com¬ 
pletely  interchangeable  (w.r.t.  least  models)  with  constraints  of  the  form 
Xi  =  se\,...,Xn  =  stn  where  Ai , . . . , A'„  are  distinct.  Hence,  we  can  choose 
between  these  two  forms  of  constraints.  In  two  previous  papers  on  set  con¬ 
straints  [21,  24],  we  chose  to  use  the  equational  form.  However,  in  this  thesis 
we  use  the  D  form  because  it  simplifies  some  of  the  presentation. 

We  also  note  that  when  the  set  operators  in  OV  are  monotonic,  propo¬ 
sition  16  can  be  strengthened  as  follows. 

Proposition  19  Let  se  be  a  set  expression  that  contains  sei  as  a  subex¬ 
pression.  Let  se'  be  the  result  of  replacing  sei  in  se  by  the  set  expression 
sc2-  Ifl(sei)  C  I(se2)  then  I(se)  C  I(se').  [j 

We  conclude  this  section  by  relating  variable-expression  form  constraints 
and  the  definite  set  constraints  considered  by  Heintze  and  Jaifar  in  [22]. 
Whereas  variable-expression  form  constraints  are  of  the  form  X  D  se,  defi¬ 
nite  set  constraints  are  of  the  form  ae  D  se  such  that  ae  is  a  set  expression 
that  does  not  contain  any  set  operators.  Clearly  definite  set  constraints  are 
a  strict  generalization  of  variable-expression  form  constraints.  A  collection 
of  definite  constraints  may  not  have  any  models,  but  if  there  is  a  model  then 
there  is  a  least  model.  In  essence,  [22]  proceeds  by  reducing  such  constraints 
into  constraints  of  the  form  X  D  se,  and  the  core  algorithm  of  the  paper 
then  solves  these  reduced  constraints.  The  core  algorithm  of  [22]  is  essen¬ 
tially  an  early  version  of  the  set  constraint  algorithm  described  in  the  next 
chapter. 


6.2  Environment  Constraints  and  Set  Constraints 


We  now  translate  environment  constraints  into  set  constraints  in  such  a  way 
that  the  least  set  based  interpretation  of  the  environment  constraint  corre- 
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spends  to  the  least  model  of  the  set  constraints.  Let  P  be  a  program  and 
let  Xi,..  .,Xm  denote  the  program  variables  in  P.  Now,  for  each  program 
point  fi,  introduce  set  variables  . . . ,  .  Intuitively,  the  set  variable  A'/* 

shall  denote  the  set  of  values  for  the  program  variable  Xi  at  the  point  fi.  In 
other  words,  the  set  environment  shall  be  represented  by  the  tuple  of  set 
variables  . . .  ,X^). 

The  construction  of  set  constraints  from  environment  constraints  re¬ 
quires  the  systematic  replacement  of  program  variables  with  set  variables. 
For  example,  consider  the  environment  constraint  9^  D  which  states 
that  each  environment  in  must  appear  in  9^.  If  is  represented  by 
{Xi, . . . ,  X^)  and  by  . . . ,  X^),  then  appropriate  set  constraints  for 
9^  D  9^  are: 

Xt  2  Xi^  for  i  =  l..n 

Now  consider  the  environment  constraint  9^  D  'J'^[A'3H+cons(A’i,  A’2)],  cor¬ 
responding  to  an  assignment  statement  X3  :=  cons(Xi,A'2).  The  values  for 
the  variables  different  from  X3  remain  unchanged.  The  value  for  X3  is  given 
by  treating  X^  as  a  set  mapping  and  applying  it  to  con5(A!’j, ^’2).  This  can 
be  expressed  using  set  constraints  as  follows; 

X^  D  cons{X^,X2) 

Xf^  2  Xi^  foTijtS 

A  more  complicated  example  is  9'*  2  ^^[-X^2'^car(A’3)],  corresponding 
to  an  assignment  statement  X2  :=  car{X3).  In  essence,  we  wish  to  construct 
the  following  set  constraints: 

X^  2  car{X^) 

X^  2  Xl  fort#2 

However  these  set  constraints  are  not  faithful  to  the  meaning  of  9*^  2 
'i'^[A’2'-»car(A'3)].  This  is  because  car{X^)  provides  an  implicit  restriction 
on  the  values  for  X3  after  the  assignment  statement:  they  must  all  be  of  the 
form  cons(’  •  >).  To  appropriately  modify  the  constraints  so  that  they  reflect 
this  condition,  recall  that  the  definition  of  the  set  based  interpretation  of  an 
environment  expression  ►-+  t]  is; 

I  ►-+  t])  =  ^[A’t->p(t)]  where  p  is  .4  ({p  €  T(9)  :  p  t>  t}). 

That  is,  the  definition  first  constructs  a  set  environment  A{g  =  {p  € 
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J(^)  :  />  >  t})  and  then  modifies  this  set  environment  to  reflect  the  as¬ 
signment  X  :=  t.  This  indicates  a  two  stage  process,  and  the  set  con¬ 
straints  corresponding  to  D  '^'•^[A'2•-♦car(A^3)]  are  similarly  constructed 
in  two  stages.  Specifically,  let  be  new  set  variables,  and  the 

intention  is  that  these  variables  will  be  used  to  capture  the  “temporary” 
set  environment  g  =  A{{p  G  :  p  t>  car{jr3)}).  Now,  corresponding  to 
ijr/*  3  'Jf'^[A’2^-»car(Jf3)],  construct  the  following  set  constraints: 

^3  2  {X3:.Y3econs(T,T)AX3e 

Xi  D  Xl'  fori#  3 

X^  D  car^Xz) 

Xt  2  Xi  fori#  2 

The  first  group  of  set  constraints,  labeled  (1),  corresponds  to  the  construc¬ 
tion  of  set  environment  p,  and  in  essence  r^tricts  the  values  of  X3  so  that 
car(X3)  is  defined.  The  second  group  of  set  constraints,  labeled  (2),  updates 
set  for  Xi  to  car(X3),  and  retains  the  sets  for  the  other  variables. 

The  actual  constraints  used  are  slightly  more  complex  than  these  exam¬ 
ples  suggest.  The  main  reason  for  this  is  that  if  i  #  i  then  the  set  variables 
Xf  and  X^  are  essentially  independent  of  each  other  -  there  is  no  a  priori 
reason  why  one  cannot  be  empty  and  the  other  non-empty.  However,  recall 
that,  by  definition,  a  set  environment  g  is  subject  to  the  following  require¬ 
ment:  if  g  maps  some  program  variable  into  the  empty  set,  then  it  must 
map  all  program  variables  into  the  empty  set.  Hence,  if  the  set  variables 
{Xi,...,X^)  are  used  to  represent  a  set  environment  '9^,  then  care  must 
be  taken  to  ensure  that  if  Xl‘  is  empty,  for  some  t,  then  X^^  is  empty  for  ail 
i. 


We  now  give  the  details  of  the  set  constraints.  Since  the  treatment 
of  program  terms  will  require  the  replacement  of  program  variables  by  set 
variables,  first  define  that  {X\,...,Xm*-*X\.,...,Xr,^  denotes  the  renaming 
substitution  that  maps  Xi  into  Xi,  so  that  t[Xi.,....,Xm>-*Xi,...,Xm]  is 
the  result  of  replacing  each  program  variable  Xi  by  the  set  variable  Xi. 
The  set  constraints  for  a  program  P  can  now  be  defined  by  translating  the 
environment  constraints  of  as  follows. 


Definition  13  (Set  Constraints)  The  set  constraints  SCj>  for  a  program 
P  are  constructed  as  follows: 
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(i)  For  each  constraint  4'*'  D  9^,  SCp  contains^^: 

Xt  2  {Xi :  €  A'/},  *  =  l..m. 

(ii)  For  each  constraint  5'**  D  T,  SCp  contains: 


Xf'  2  T,i=  l..m. 

(Hi)  For  each  constraint  D  SCp  contains: 


Xi  2 
Xf^  2 


(Xi  :  defined(t)  A  t\j^x..mX3  €  Xf),  i  =  l..m 


r  X,et[Xu...,X,n^Xx,...,Xr,,] 
\  ’■  XjeXj, 


i  =  \..m 


where  X\,...,Xm  are  distinct  new  set  variables  and  defined{t)  denotes  the 
conjunction  of  all  quantified  conditions  s  G  /(T, ...,T)  such  that  t  has  a 
subterm  of  the  form 


(iv)  For  each  constraint  9**  2  'J'^Icond],  let  cond  be  conjj  V  •  •  •  V  cory„  and 
SCp  contains: 

Xi  2  {^«  •  defined(cond)  A  Ajsi..mXj  €  X^},  i  =  l..m 
Xf'  2  :  translate{conj jf)  A  ^  Xj},  i  =  l..m,  k  =  l..n 


where  Xi,...,Xjn  are  distinct  new  set  variables,  defined{cond)  denotes  the 
conjunction  of  all  quantified  conditions  s  E  f(T,...,T)  such  that  cond  has  a 
subterm  of  the  form  /yj  (■s)>  and  translate(conj^)  is  defined  to  be  conjunction 
consisting  of  the  following  quantified  conditions: 


»  (a  ^  ^*[Xi,-’-,Xm*-^Xl^, 
tEs[Xi,...,Xm^Xx\ 

(a  ^^l[Xl,...,Xm*^Xi,. 

tuiXi,...,Xm^Xx\. 


. ,  A'^]  \  for  each  condition  of  the 
■^X^]  )  form  s  =  t  in  amj^. 

,A'^]  \  for  each  condition  of  the 
,X^]  )  form  -1(5  =  t)  in  conj^. 


for  each  condition  of  the 
form  match j{t)  in  conj/^. 

*We  note  that  4*'  D  4^  could  alteniativdy  be  translated  into  Xf  D  <  =  l..m.  Although 
these  alternative  constraints  are  simpler  and  clearly  desirable  in  practice,  we  use  the  more  complex 
constraints  for  presentational  reasons  becemse  they  clarify  relationships  between  the  emptiness  of 
the  variables  Xf, ..., X^. 


•  t  E  /(T,...,T) 
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•  <  €/(T,...,T) 


for  each  condition  of  the 
form  ->(match f(t))  in  conjj^. 


(v)  For  each  constraint  '9'*  Q  [Ai  €  Bi.9^^,...,An  €  SCp  con¬ 

tains: 

Xt  3  A  i  =  l..m 

I  it=l..n  ) 

D 

The  set  constraints  SCp  correctly  characterize  approx p  in  the  following 
sense. 


Proposition  20  Let  P  be  a  program.  Then,  for  all  program  points  p, 
sbap(9^)(Xi)  =  (lm(SCp))(Xf),  i  =  l..m. 

Proof:  The  first  step  of  the  proof  establishes  a  strong  correspondence  be¬ 
tween  models  of  SCp  and  models  of  SCp.  SpecificaJly,  if  Jec  is  an  interpre¬ 
tation  of  SCp  and  is  an  interpretation  of  SCp  then 

If  (a)  XsciXf)  =  for  all  p  and  i,  and 

(b)  TaciX)  =  I,c(«c)  for  all  constraints  X  D  se  m  SCp 

where  the  variable  X  is  not  of  the  form  Xf  \  •  f 

then  lec  ^  SCp  iff  I,c  SCp. 

To  prove  property  (6.13)  let  Jec  be  a  model  of  SCp  and  let  J,c  be  a  model 
of  SCp,  and  suppose  that  conditions  (a)  and  (b)  are  satisfied.  Now,  corre¬ 
sponding  to  each  constraint  ia  SCp,  a,  collection  of  constraints  is  introduced 
into  SCp.  It  therefore  suffices  to  show  that  for  each  of  the  cases  (i-v)  in 
the  construction  of  SCp,  the  environment  constraint  considered  is  satisfied 
by  lec  iff  tbe  set  constraints  constructed  are  satisfied  by  J«c.  Consider  each 
case  in  turn. 

In  case  (i),  an  environment  constraint  2  is  considered  and  set 
constraints  Xj^  D  {X,-  :  ^  X^},  i  =  l..m  are  constructed.  To 

prove  this  case,  consider  the  following  chain  of  reasoning: 
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iff  Iec(«"‘)(X,)  D  Icci^^XXi) 

iff  X,ci9^){Xi)  D  {piXi):peXec{^^)} 

iff  X^ci^niXi)  D  {piXi):piXi)^X,c{9>^)(Xi),j=l..m} 

iff  Iec(«'‘)m  2  {p(Xi):piXi)eXsc(X^^),j  =  l..m} 

iff  Xscixt)  2 

The  first  step  in  this  chain  follows  from  the  pointwise  ordering  of  the  com¬ 
parison  of  set  environments.  The  second  and  third  steps  follow  from  the 
definition  of  set  environments.  The  fourth  step  follows  from  the  equality 
XaciX-^)  =  Xee{9^)(Xi).  The  last  step  just  follows  from  the  definition  of 
6  ■»■/). 

In  case  (ii),  an  environment  constraint  3  T  is  considered  and  set 
constraints  X-^  2  T,  i  =  l..Tn  are  constructed.  Clearly  Iec('®"‘)  2  Tee(T)  iff 
=  {all  values}  iffI.c(Af )  D  I,c(T). 

Now  consider  case  (iii),  and  let  g  be  the  set  environment  A{{p  €  Jec('®"')  : 
p  >  t}).  Since  condition  (b)  is  satisfied,  the  set  constraints  Xi  D  {X,-  : 

.m  are  guaranteed  to  be  satisfied  by 
Xse-  Condition  (b)  also  implies  that  g{Xi)  =  I,c(^<)  for  t  =  L.m,  as  the 
following  chain  of  reasoning  demonstrates: 


giXi)  =  {p(X,) 
=  {P(^.) 
=  {piXi) 


:  p  >  t  and  p  6  Iec(^^)} 

:  p  >  t  and  p{Xj)  €  Iec(^^)(^i),  j  =  l-m) 

:  p  >  t  and  p{Xj)  e  Xac{Xj^),  j  =  l-m} 

{(Y\  subterm  /^(s)  of  t\ 

fori  =  l..m  / 

{,  .  p(s)  €  X{se)  for  each  s  €  se  in  d€fined(t)\ 

=  p{Xj)  €  for  j  =  l..m  j 

=  T.c({^,  :*yined(t)  A 


The  first  equality  is  just  the  definition  of  g.  The  second  follows  from  the 
fact  that  p  6  lecC'f'*)  iff  Ai»i..mP(X,)  €  Xec{^^){Xj).  The  third  Mows 
from  hypothesis  (b)  of  (6.13).  The  fourth  follows  from  the  observation  that 
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the  place  in  the  definition  of  p{t)  where  undefinedness  can  be  introduced  is 
at  the  evaluation  of  projections.  In  particular,  it  is  easy  to  verify  that  p{t) 
is  defined  iff  for  all  subterms  of  t  that  are  of  the  form  it  is  the  case 

that  /»(«)  is  /(wi,  for  some  values  i;,-.  The  fifth  equality  follows  from 

the  definition  of  defined.  The  sixth  equality  follows  from  the  definition  of 

Finally,  the  seventh  equality  follows  from  condition  (b). 

Having  established  that  g(Xi)  =  it  follows  from  a  simple  struc¬ 

tural  induction  argument  on  any  program  term  t,  that 

p(f)  =  Xgc{t\X\y . , , 

The  proof  for  case  (iii)  can  now  be  completed  by  the  following  chain  of 
reasoning: 

X^{fil>^)  2  X,,(^\Xi^t]) 

2  Q[Xi^Q(t)] 

2  e(t),  and  \ 

Iec(9n(Xi)  2  e(Xi),  i^l) 

2  X,c{t[Xu.:.Xr„,^Xu...,Xm]),  and 

Now  consider  case  (iv),  let  g  be  the  set  environment  A{{p  €  Iec('®'*)  : 
p  >  cond}),  and  write  cond  in  the  form  conji  V  •  •  •  V  conj^  where  each  conj\ 
is  a  conjunction  of  atomic  program  conditions.  Again  condition  (b)  implies 
that  the  constraints  Xi  2  {Xi :  defined{cond)  A  Ay=i..TO  Xj  €  X^},i  =  l.,m 
are  satisfied  by  Xgc-  Using  reasoning  identical  to  that  in  case  (iii),  it  is  easy 
to  verify  that  condition  (b)  also  implies  that  g(Xi)  =  and  hence 

g(t)  =  X,cit[Xi,...,Xn^Xi,...M) 

for  any  program  term  t.  Case  (iv)  can  now  be  established  using  the  following 
chain  of  reasoning. 
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Tec(^^)  2  A({p  €  Qip^e  conjy  V  •  •  •  V  conj^}) 

iff  /\  Iec(^'‘)(-^t)  2  :  P  €  e  A  p  oonj  1  V  •  •  •  V  con;„} 


iff  A  A  5  {p(Xi)  :  p  6  p  A  p  h,  conj*} 

k=l..n 


iff  A  A  2  |p(X.):pl=,«>nifc  A  A  p{Xi)eeiXi)\ 

t=l..m  Jk=l..n  ^  j=l..r  J 

“1 

:  translate{conj h)  A  A  ^ 


iff  A  A  2  |pTO:pK«>"ifc  A  A  PiXj)eXsciXj)\ 

i=l..mk=l..n  ^  j=l..r 


iff  A  A  W)  2  I. 

k=l..n 


The  first  four  steps  are  straightforward:  the  first  follows  from  the  defi¬ 
nition  of  A,  the  second  from  the  fact  that  p  condi  V  •  •  •  V  conj^  iff 
p  condk  for  some  k,  the  third  from  the  definition  of  p  6  p,  and  the 
fourth  from  Iec{^‘^)iXi)  =  I,c{A:P)  and  p(Xi)  =  Isd^i).  The  core  part 
of  the  proof  is  the  last  step.  To  prove  this,  let  a  abbreviate  the  renauning 
[Xi, . . .  ,Xm*-*Xi,. . . ,  AVn],  and  Consider  the  following  chain  of  equalities: 
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l/K-fi): 


p  conjig  and 
for. 


I  \ 

J  =  I-**/ 


=  ipiXi)  : 


p(s)  6  Q{t)  A  p{t)  €  e(«) 
3v (v  €  pit)  ^v^  q(s)) 
3«(w  €  e(«)  Aw#  p(f)) 
p(s)  has  form  /(•••) 


for  all  5  =  f  in  conj^ 
for  all  ->(s  =  <)  in  conj^ 
for  all  ->(s  =  <)  in  conjh 
for  all  matchf{s)  in  conji^ 


p{3)  is  not  of  form  /(•  •  •)  for  all  -•(matckf(s))  in  conjj^ 


for  j  =  l..r 


=  : 


p(s)  €  Xac{<f{^))  A  p{t)  €  ^(©■(a))  for  all  5  =  t  in  conjf^ 

3w  (w  €  X,e{a{t))  Aw#  pis))  for  all  ->(s  =  t)  in  conj^ 

3v  (v  €  Xae(<T(s))  Aw#  p(t))  for  all  ->(s  =  t)  in  conj/j, 

p(s)  €  Xae(/(T, . . . ,  T))  for  all  matchj(s)  in  conj^ 

p(s)  €  X,c{f{J, . . . ,  T))  for  all  ->{matchf{s))  in  conji^ 

p{Xj)eXac(Xj)  forj  =  l..r 


pis)  €  Xise)  for  all  s  €  ae  in  translateiconjii)^ 

=  -  piXi) :  3w  for  all  s  1 5e  in  translateiconj i,)  i 

piXj)eXaciXj){0Tj  =  l..r  i 

=  T,c  ^  A 

The  first  equality  in  this  chain  is  just  the  expansion  of  the  definition  of 
p  conjff,  noting  that  condk  is  just  a  conjunction  of  atomic  program 
conditions.  The  second  equality  holds  because: 


= 


for  all  s  t  se  in  translateiconj i^) 


•  it  has  previously  been  established  that  pit)  =  J«e(o(t))  for  any  pro¬ 
gram  term  t  (recall  the  at  <r  abbreviates  [Xi, . . .  yXm>-*^Xi,. . . ,  A'to]), 

•  Xaci/iT, . . . ,  T))  is  the  set  of  all  values  of  the  form  /(•  •  •)>  and  so  the 
condition  p(s)  €  T«c(/(T,...,T))  is  equivalent  to  the  condition  “p(^) 
has  form  /(■  •  •)”,  and 

•  Xfci/iT, . . . ,  T))  is  the  set  of  all  values  not  of  the  form  /(•  •  •),  and  so 


6.2.  ENVIRONMENT  CONSTRAINTS  AND  SET  CONSTRAINTS  159 


the  condition  p(s)  €  2ac(/(T,...,T))  is  equivalent  to  the  condition 
“p(s)  is  not  of  form  /(•  •  •)”• 

The  third  equality  follows  immediately  from  the  definition  of  translate,  and 
finally,  the  fourth  equality  is  just  the  definition  of  lac- 

This  completes  the  proof  of  (6.13).  The  remainder  of  the  proof  uses 
(6.13)  to  show  the  following  two  properties: 

(1)  There  is  a  model  lac  of  SCp  such  that  /m(£Cp)(’®"‘)(.y,)  =  Iac(A’/‘). 

(2)  There  is  a  model  Ige  of  SCp  such  that  Iec(’®'*)(-X’,)  =  lm(SCp){X-^). 


These  two  properties  imply  the  proposition  because  from  the  first  it  follows 
that 

lm{SCp)(^‘^)iXi)  =  laci^t)  2  lm{SCp)iXt), 

and  from  the  second  it  foDows  that 

lm{€Cp)i^^)(Xi)  C  leci^^XXi)  =  lmiSCp)iXt). 

and  together  these  imply  that  lm{SCp){9^){Xi)  =  lm{SCp){Xl^). 

To  prove  property  (1),  define  an  interpretation  of  SCp  by 

T  irniSCpX^nm  if  X  is 

-  I  i,^(5e)  if  X  is  not  of  form  Xl‘ Bud  X  D  se  is  in  SCp 

This  is  well  defined  because  each  set  variable  X  appearing  in  SCp  is  either 
of  the  form  X-^  or  else  it  is  one  of  the  extra  variables  introduced  in  the 
translation  of  environment  expressions  '^[cond]  or  In  the  latter 

case  there  is  only  one  constraint  of  the  form  X  D  se,  and  se  contains  only 
variables  of  the  form  X-^.  Hence  this  definition  does  yield  an  interpretation 
lac  of  SCp,  and  it  only  remains  to  verify  that  I,c  is  a  model  of  SCp.  From 
the  lac,  it  is  clear  that  the  pair  of  models  lm{SCp)  and  I,c  satisfy  conditions 
(a)  and  (b)  of  (6.13),  and  so  (6.13)  implies  that  I,c  }=  SCp. 

To  prove  property  (2),  define  an  interpretation  lee  of  SCp  by 
lm{SCp){Xn 
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The  fact  that  this  does  define  an  interpretation  Jec  of  €Cp  is  not  obvious 
because  an  interpretation  lec  of  £Cp  must  satisfy  the  following  property  for 
all  fi: 

if  =  {}) 

then  =  {}) 


where  i  ranges  over  l..m.  This  requires  that  lm{SCp)  satisfy: 

(6.14) 

Note  that  this  property  does  not  hold  for  an  arbitrary  model  of  SCp',  its 
proof  uses  special  properties  of  least  models.  Specifically,  Proposition  17 
proves  that  if  v  €  lm(SCp)(X)  then  SCp  contains  a  constrsdnt  X  D  se  such 
that  V  €  lm(€Cp)(se). 


if  3i(lmiSCp)iX^)  =  {}) 
then  Vi(/m(5Cp)(A'/‘)  =  {}) 


Now,  to  prove  (6.14),  suppose  that  lfn{SCp){X^)  ^  {}  for  some  r,  1  < 
r  <  m.  Then  there  exists  some  v  6  lm(SCp)(X^),  and  so  by  Proposition  17 
there  must  be  some  constraint  Xj^  D  se  in  SCp  such  that  v  €  lm{SCp)(se). 
Now,  this  constraint  X^  D  se  could  have  been  introduced  via  any  of  steps 
(i-v)  of  the  construction  of  SCp.  These  possibilities  are  split  into  two  cases. 


In  the  first  case,  se  is  of  the  form  {Xr  :  conj)  and  SCp  contains  the 
constraints  Xf^  D  {X,-  :  conj)  for  all  t  =  l..m  (this  case  includes  all  con¬ 
straints  introduced  in  steps  (i),  (iii),  (iv)  and  (v)).  If  w  €  lm(SCp)(se),  then 
by  definition  there  exists  an  environment  p  such  that  p(Xr)  =  v  and  p  € 
lm{SCp){conj),  and  it  immediately  follows  that  each  lm{SCp){{Xi :  conj}) 
is  non-empty,  and  so  lm(SCp)(Xj')  ^  {},  j  =  l..m. 

In  the  second  case,  the  constraint  is  introduced  by  step  (ii),  and  SCp 
contains  the  constraints  Xj  2  T  for  all  j'  =  l..m.  Clearly  lm{SCp){X^)  is 
equal  to  the  set  of  all  values,  j  =  l..Tn. 


This  completes  the  proof  of  (6.14),  and  so  the  mapping  Jec  defined  using 
lm{SCp)  is  in  fact  an  interpretation  of  €Cp.  It  remains  to  show  that  Jec  is 
a  model  of  SCp.  Now,  consider  the  constraints  of  the  form  X  D  se  ia.  SCp 
where  the  variable  X  is  not  of  the  form  Xf*.  By  the  construction  of  5Cp,  if 
such  a  constraint  is  present,  then  it  is  the  only  lower  bound  for  the  variable 
X.  Hence  it  foUows  from  Proposition  18  that  lm{SCp)(X)  =  /m(5Cp)(sc), 
and  so  part  (b)  of  (6.13)  holds.  Furthermore,  part  (a)  holds  because  of  the 
definition  of  Jec-  Thus  from  (6.13),  Jec  SCp.  This  completes  the  proof  of 
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property  (2),  and  so  the  proposition  if  proved.  [] 


162 


CHAPTER  6.  SET  CONSTRAINTS 


Part  II 

Set  Based  Analysis 


Haviog  defined  set  based  program  approximation,  we  now  show  how  this 
approximation  may  be  computed.  We  begin  by  translating  the  environ¬ 
ment  constraints  into  set  constrsunts  such  that  the  least  set  based  model 
of  the  environment  constraints  corresponds  to  the  least  model  of  the  set 
constraints.  We  then  present  2ui  algorithm  for  solving  these  set  constreunts. 
The  output  of  the  algorithm  is  a  representation  of  the  least  model  of  the 
input  constraints  that  is  explicit  in  the  sense  that  structural  properties  of 
the  model  can  be  easily  inferred.  We  conclude  by  describing  a  prototype 
implementation  of  the  set  constraint  algorithm. 
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Solving  Set  Constraints 


In  the  previous  chapter,  the  problem  of  computing  the  set  based  approxima¬ 
tion  of  a  program  was  reduced  to  the  computing  the  least  model  of  a  collec¬ 
tion  of  set  constraints.  This  chapter  presents  algorithms  for  constructing  an 
explicit  representation  of  the  least  model  of  such  set  constraints.  We  first 
address  the  issue  of  explicit  representation  using  regular  term  grammars. 
Then,  we  present  a  high  level  description  of  the  set  constraint  algorithm 
in  the  form  of  a  generic  algorithm.  The  remainder  of  the  chapter  presents 
two  progressively  more  complex  instances  of  this  algorithm.  The  second  of 
these  algorithms  proves  the  main  result  of  this  thesis:  set  based  program 
approximations  are  decidable  and  can  be  represented  using  regular  term 
grammars. 


165 


166 


CHAPTER  7.  SOLVING  SET  CONSTRAINTS 


7.1  Explicit  Representation  of  lm{C) 

We  first  address  the  issue  of  what  kind  of  object  is  output  by  the  set  con¬ 
straint  algorithms  and  why  this  is  an  appropriate  explicit  representation. 
Recall  that  a  model  of  a  collection  of  set  constraints  is  a  mapping  that  as¬ 
sociates  a  set  of  values  to  each  set  variable.  Now,  clearly  such  sets  of  values 
may  be  infinite.  Therefore,  to  provide  a  description  of  the  least  model  of  a 
given  collection  of  set  constraints,  the  set  constraint  algorithm  must  output 
descriptions  of  sets  of  values,  one  for  each  set  variable.  The  descriptions 
output  by  our  algorithm  are  essentially  regular  term  grammars. 

Regular  Term  Grammars 

A  regular  term  grammar  Q  consists  of  a  set  NTg  of  non-terminals,  a  set 
Eg  of  function  symbols,  each  with  a  unique  arity,  and  a  finite  set  Vg  of 
productions.  To  define  productions,  first  define  that  a  term  is  either  a  non¬ 
terminal  or  of  the  form  /(fi, . . .  ,tn)  where  /  is  an  n-ary  symbol  from  E  and 
each  ti  is  a  term.  Now,  a  production  is  of  the  form  nt  =>  t  such  tliat  nt  is 
a  non-terminal  from  NTg  and  t  is  a  term.  Using  the  productions  ba.Vg,  a. 
derivability  relation  on  terms  can  be  defined  in  the  obvious  way:  =»  <2 

if  there  is  a  production  nt  =>  t  and  t2  is  obtadned  from  ti  by  replacing  an 
occurrence  of  nt  in  ti  by  t.  Let  =►*  denote  the  transitive  reflexive  closure 
of  The  language  corresponding  to  a  non-terminal  nt,  denoted  £(nt),  is 
defined  as  follows: 

£(nt)  =  {t :  nt  t  and  t  does  not  contain  non-terminal  symbols} 

For  example,  consider  the  grammar  where  non-terminals  are  list  and  int, 
the  set  of  function  symbols  is  {cons ^  nil,  succ,  zero},  and  V  consists  of  the 
productions 

int  =>  succ(int) 
int  ^  zero 
list  ^  cons{int,list) 
list  =>  nil 

This  grammar  describes  integers  in  successor-zero  notation,  and  lists  of  in¬ 
tegers.  If  5  is  a  set  of  terms  such  that  there  is  some  regular  term  grammar 
Q  and  non-ternainal  nt  €  NTg  such  that  5  =  £(nt),  then  5  is  regular  set  of 
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terms  and  (^,n<)  is  called  a  description  of  5.  As  another  example,  consider 
the  productions 

X  ^  cons{c,d) 

X  ^  cons(d,  c) 

X  =>  conslx,X) 

X  =>  cons{cons{c,d),cons(d,c)) 

where  NTg  =  {X}  and  Eg  =  {cons,c,d}.  Here  C{X)  consists  of  elements 

such  as  cons(c,d)  and  cons{d,c),  as  well  as  cons{cons{c,d),cons{c,d))  and 
cons{cons{c,d),cons(d,c)).  Note  that  the  last  production  is  redundant. 

Regular  term  grammars  are  essentially  equivalent  to  tree^  automata  (the 
natural  generalization  of  finite  state  automaton  to  terms).  Tree  automata 
can  be  divided  into  four  classes,  according  to  whether  (a)  they  are  deter¬ 
ministic  or  non-deterministic,  and  (b)  whether  they  are  root-to-frontier  or 
frontier-to-root  (that  is,  whether  they  start  from  the  root  and  work  to¬ 
wards  the  leaves  of  the  tree,  or  vice-versa).  The  languages  definable  by  non- 
deterministic  root-to-frontier  tree  automata,  non-deterministic  frontier-to- 
root  tree  automata,  deterministic  frontier-to-root  tree  automata  and  regular 
term  grammars  are  all  eqmvalent.  See  [19]  for  further  details.  (Note  that 
deterministic  root-to-frontier  tree  automaton  are  strictly  less  powerful.  In 
particular  they  correspond  to  regular  term  grammars  where  the  productions 
for  each  non-terminal  involve  distinct  outermost  function  symbols.  Specifi¬ 
cally,  if  nt  =>  t  and  nt  =>  t'  are  distinct  productions,  then  t  must  be  of  the 
form  /(•  •  •)  and  P  must  be  of  the  form  /'(•  •  •)  such  that  /  ^  /'.) 

Regtilar  term  grammars  provide  a  representation  of  a  set  of  terms  that 
is  explicit  in  the  sense  that  there  are  straightforward  polynomial-time  al¬ 
gorithms  to  determine  membership  and  emptiness.  Furthermore,  there  are 
standard  algorithms  to  compute  the  intersection,  union,  complementation 
and  containment  of  regular  term  grammars.  In  short,  a  regular  term  gram¬ 
mar  description  of  a  set  of  terms  provides  a  presentation  of  the  set  that 
exposes  much  of  the  internal  structure  of  the  set. 

Regular  term  grammars  can  also  be  used  explicitly  represent  a  set  con¬ 
straint  interpretation.  For  this  purpose,  it  is  convenient  to  identify  non¬ 
terminals  with  set  variables.  Then,  a  grammar  Q  represents  an  interpre- 

^The  notions  of  “tenn”  and  ‘Hree”  are  completely  intercbangeable  in  this  context.  Terms  are 
simply  labeled  trees  rmd  vice  versa.  The  use  of  “terms"  in  term  grammars  and  “trees”  in  tree 
automata  is  largely  historical. 
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tation  I  if  C{X)  =  X{X)  for  each  set  variable  X.  Clearly,  only  certain 
interpretations  can  be  represented  in  this  way.  Now,  the  essence  of  the  set 
constraint  algorithm  is  to  input  a  collection  C  of  set  constraints  and  output 
a  regular  term  grammar  description  of  lm{C).  Note  that  there  is  no  a  priori 
reason  why  the  least  model  of  C  should  be  representable  by  a  regular  term 
grammar.  The  set  constraint  algorithm  in  fact  provides  a  constructive  proof 
that  this  is  always  the  case. 


Explicit  Form  Constraints 

Strictly  speaking,  the  output  of  the  set  constraint  algorithm  is  not  a  regular 
term  grammar,  but  rather  a  restrictive  class  of  set  constraints  that  essentially 
corresponds  to  a  regular  term  grammar.  To  define  this  class  of  constraints, 
we  define  the  atomic  set  expressions,  which  are  essentially  set  expressions 
that  do  not  contain  set  operators  of  arity  n  >  1. 


Definition  14  (Atomic  Set  Expressions)  A  set  expression  is  atomic  if 
it  is  constructed  from  set  variables,  function  symbols,  the  special  constants 
T  and  ±,  and  set  operators  of  arity  0.  [] 

For  example,  f{f^^{X))  and  X  Ciy  are  not  atomic,  but  c,X  and  f{c,X) 
are.  In  what  follows,  we  shall  reserve  the  letter  a  for  atomic  set  expressions. 
Explicit  form  constraints  can  now  be  defined  as  follows. 


Definition  15  (Explicit  Form  Constraints)  A  constraint  X  D  a  is  in 
explicit  form  if  a  is  an  atomic  set  expression  that  is  not  a  set  variable.  A 
collection  C  of  constraints  is  in  explicit  form  if  each  constraint  in  C  is  in 
explicit  form.  [] 

The  output  of  the  algorithm  is  a  collection  of  explicit  form  constraints. 
Such  constraints  C  can  be  viewed  as  a  regular  term  grammar  Qc  whose  non¬ 
terminals  are  set  variables  and  whose  productions  are  {<!'  =>  t  :  (X  D  t)  € 
C).  In  general,  this  grammar  Qc  is  not  a  regular  term  grammar  because  of 
presence  of  constants  such  as  T,  ±  and  'S.  However  Qc  is  a  straightforward 
extension  of  the  notion  of  regular  term  grammar  in  which  the  constants  T, 
±  and  ?  denote  their  usual  sets.  Specifically,  extend  the  definition  C  to  be 
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C{X)  =  {w  €  T{i) :  X  =>*  /  and  t  does  not  contain  non-terminals} 

where  J  is  some  interpretation.  Note  the  choice  of  X  is  immaterial  since  t 
does  not  contain  non-terminals  (set  variables)  and  so  J(t)  is  independent 
of  X.  Given  such  a  definition,  Qc  and  lm{C)  are  equivalent  in  the  following 
sense: 


lm{C){X)  =  C{X)  for  each  set  variable  X  appearing  in  C. 

Furthermore,  even  though  Qc  is  not  quite  a  regular  term  grammar,  it  is 
stUl  an  explicit  form  in  the  sense  that  questions  about  membership,  non¬ 
emptiness,  etc.,  can  easily  be  answered.  Hence  C  can  be  viewed  as  explicit 
representation  of  its  own  least  model  because  it  is  essentially  a  regular  term 
grammar  description  of  lm(C).  Constraints  of  the  form  X  D  a  where  a  is  a 
non-variable  set  expression  contmning  only  constants,  function  symbols  and 
set  variables  are  therefore  call  explicit  form  constraints.  Such  constraints 
play  a  key  role  in  the  set  constraint  algorithm. 


Note  that  if  Eg  is  finite,  then  Qc  can  be  treated  as  a  regular  term  gram¬ 
mar  because  the  values  of  the  constants  T,  i.  and  can  be  represented  by 
regular  term  grammars.  A  grammar  for  T  can  be  constructed  by  treating 
T  as  a  non-terminal  with  productions  T  =►  /(T, ...,T)  for  each  /  6  Eg. 
A  grammar  for  1  can  be  constructed  by  treating  ±  as  a  non-terminal  with 
no  productions.  Similarly  there  is  a  straightforward  construction  for  each 
constant  5.  For  example,  if  E©  is  {c,d,  f,g,  h},  where  c  and  d  are  constants, 
/  and  g  are  unary,  and  h  is  binary,  then  {/(T),h(c,d)}  can  be  represented 
by  the  regular  term  grammar: 


{/(T),h(c,d)}=>c 
{/(T),h(c,d)}=>d 
{/(T),h(c,d)}  =►  5(T) 
{/(T),/i(c,d)}=»Mc,T) 
{/(T),Mc,d)}=»h(T,d) 


d 

5=»/(T) 
<;=>  /i(T,T) 


d^  c 
d  =►  /(T) 
d=}-ff(T) 
d=>h{T,T) 


Some  Basic  Algorithms  on  Explicit  Form  Constraints 

We  have  already  noted  that  regular  term  grammars  provide  a  convenient 
representation  of  sets  of  terms  because  there  are  straightforward  algorithms 


170 


CHAPTER  7.  SOLVING  SET  CONSTRAINTS 


to  determine  membership  and  emptiness,  as  well  as  compute  intersections 
and  complementations,  and  these  can  be  easily  adapted  to  explicit  form 
constraints.  Since  the  set  constraint  algorithm  presented  in  this  chapter 
shall  employ  a  number  of  these  basic  algorithms,  we  shall  conclude  this 
section  by  providing  a  brief  outline  of  the  necessary  details.  We  stress  that 
these  basic  algorithms  are  adaptations  of  known  results  (see,  for  example, 
[19]),  and  are  included  only  for  the  sake  of  completeness. 

We  first  consider  the  membership  problem.  That  is,  given  an  explicit 
form  collection  of  constraints  C,  a  set  variable  X  G  var(C)  and  a  value  v, 
we  wish  to  determine  whether  v  €  lm(C){X).  Now,  by  treating  C  as  a  set 
root-to-frontier  tree  automaton,  we  can  just  use  the  definition  of  acceptance 
for  root-to-frontier  tree  automaton  to  determine  if  u  G  /Tn(C)(a).  However, 
this  does  not  lead  to  a  polynomial  time  algorithm  because  searching  for 
an  accepting  computation  requires  trying  all  possible  transitions  at  each 
computation  step,  and  this  can  lead  to  exponential  behavior.  A  polynomial 
time  algorithm  can  be  obtained  by  essentially  treating  C  as  a  frontier-to- 
root  automaton.  Specifically,  let  Sv  denote  all  subterms  of  the  given  value 
V.  Now,  to  each  value  t/  G  and  each  atomic  set  expression  a  in  C, 
assodate  a  binary  value  tn(t/,a).  The  intention  is  that  in(v',a)  shall  be 
true  if  v'  G  /m(C)(a).  Now,  it  is  easy  to  determine  whether  i/  G  lm(C){a)  if 
a  is  a  ground  atomic  expression,  and  moreover  this  is  independent  of  C.  For 
example,  if  E  is  {/,c}  then  /(c,c)  G  /m(C)(/(c,c))  and  /(c,c)  G  /m(C)(c) 
but  c  ^  /m(C)(/(c,c))  and  c  ^  /m(C)(c).  Hence,  initialize  tn(w',c)  so  that 
in(v',  a)  is  true  if  a  is  ground  and  v  G  J(a)  for  all  interpretations  J,  and 
false  otherwise.  Now,  repeatedly  update  the  values  nonempty{a)  using  the 
following  steps: 

•  Set  tn(t/,a)  to  true  if  o'  is  the  result  of  repladng  each  A'  in  a  by  v;^' 
and  tn(o^,A:')  is  true  for  each  X  G  oar(a). 

•  Set  m(o'.  A')  to  true  if  A'  D  o  appears  in  C  and  in(v',a)  is  true. 

It  is  easy  to  verify  that  these  updating  steps  terminate  since  o'  rang^  over 
subterms  of  o  and  a  and  X  respectively  range  over  all  atomic  set  expressions 
and  variables  in  C.  Moreover,  on  termination,  tn(o,a)  iff  o  G  Im(C)(a). 

We  next  deal  with  non-emptiness.  That  is,  given  an  explidt  form  col¬ 
lection  of  constraints  C  and  a  set  variable  X  G  oor(C),  we  wish  to  deter¬ 
mine  whether  lm(C)(X)  =  {}.  The  basic  structure  of  the  algorithm  is  the 
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same  as  the  membership  algorithm.  Again,  it  is  easy  to  determine  whether 
/7n(C)(a)  =  {}  if  a  is  a  ground  atomic  expression.  For  example,  if  S  is 
{/,c}  then  /m(C)(/(c,c)),  /m(C)(T),  /m(C)(/(T,T))  are  all  non-empty,  re¬ 
gardless  of  C,  whereas  /Tn(C)(±)  and  lr»(C)({/(T,T),c})  are  always  empty. 
(Note  that  the  emptiness  of  {/(T,T),c}  depends  on  E.)  Now,  associate  a 
boolean  value  nonempty(a)  with  each  atomic  expression  a  appearing  in  C 
and  initialize  these  values  so  that  nonempty{a)  is  true  if  a  is  a  ground  atomic 
expression  that  is  non-empty  in  all  interpretations.  Repeatedly  update  the 
values  noneTnpty(a)  using  the  following  steps: 


•  Set  nonempty(a)  to  true  if  a  is  not  groimd  and  nonempty{X)  is  true 
for  each  X  €  t7ar(a). 

•  Set  notiempty(X)  to  true  if  C  contains  X  D  a  and  nonempty{a)  is  true. 


It  is  easy  to  verify  that  these  updating  steps  terminate,  and  that  on  termi¬ 
nation,  nonempty(a)  is  true  iff  lm{C)(a)  ^  {},  for  all  atomic  set  expressions 
a  appearing  in  C. 

Finally,  we  deal  with  singleton  sets.  That  is,  given  an  explicit  form  col¬ 
lection  of  constraints  C  and  a  set  variable  X  €  var(C),  we  wish  to  determine 
whether  /m(C)(o)  =  {»}  for  some  value  v.  The  algorithm  for  this  property 
again  follows  the  structure  of  the  membership  algorithm.  It  is  easy  to  de¬ 
termine  whether  lm(C)(a)  =  {»}  for  some  value  v  if  a  is  a  ground  atomic 
expression.  For  example,  if  E  is  {/,c}  then  lm(C)(c),  im(C)(/(c,c))  and 
/m(C)(/(T,  T))  are  all  singleton  sets,  but  /m(C)(X),  /m(C)(T)  and  lm(C)(c) 
are  not.  Now,  consider  mappings  from  each  atomic  expression  a  appear¬ 
ing  in  C  into  {v  :  v  is  a  value}  U  {T,X}.  Initially,  let  singleton  denote  the 
following  mapping 

{T  if  a  |J(o)|  >  2  for  all  interpretations  X 
V  if  a  T(a)  =  {o}  for  all  interpretations  X 
X  otherwise 

where  |5|  denotes  the  cardinality  of  the  set  5.  Note  that  if  a  is  atomic,  then 
J(a)  is  independent  of  J  iff  a  is  ground.  Now,  repeatedly  update  the  values 
nonempty{a)  using  the  following  steps: 


•  Set  8ingl€ton{a)  to  v  if  a  is  not  ground,  singleton{X)  is  not  X  or  T, 
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for  each  X  €  var{a),  and  v  is  the  result  of  replacing  each  A'  in  a  by 
singleton{X). 

•  Set  singleton{a)  to  T  if  a  is  not  ground  and  singleton{X)  for  each 
X  €  uar(a)  and  singleton{y)  =  T  for  some  y  €  var{a). 

•  Set  singleton(X)  to  singleton{a)  if  A'  D  a  appears  in  C  and 
singleton{X)  =±. 

•  Set  singleton(X)  to  T  if  A*  3  a  appears  in  C  and  singleton{a)  ^ 
singleton(X). 

It  is  easy  to  verify  that  these  updating  steps  terminate,  and  that  on  termina¬ 
tion,  singleton(a)  is  respectively  X,  u  or  T  if  /m(C)(a)  =  {},  lm(C){a)  =  {v} 
or  |/m(C)(a)|  >  2. 


7.2  Overview  of  Algorithm 

At  a  high  level,  the  execution  of  the  algorithm  for  solving  set  constraints  can 
be  s':Lmmarized  as  follows.  Starting  with  the  input  collection  of  constraints 
Co,  a  sequence  of  constraints  Co,Ci,...,C,*,...  is  constructed  such  that  each 
collection  C,-  has  essentially  the  same  least  model  as  Co  and  is  obtained  from 
its  predecessor  C,--i  by  adding  some  new  constraints.  The  aim  of  adding 
these  new  constraints  is  to  make  the  least  model  of  C,-  more  “explicit”.  To 
formalize  this  notion,  first  recall  that  explicit  form  constraints  are  of  the  form 
X  D  a  where  a  is  a  non-variable  atomic  expression,  and  that  such  constraints 
form  the  explicit  representation  that  is  output  by  the  algorithm.  Now, 
where  C  is  a  collection  of  constrsdnts,  let  explicit{C)  denote  the  explicit  form 
constraints  in  C.  In  essence,  explicit{C)  corresponds  to  what  has  already 
been  computed  about  the  least  model  of  C.  Now,  as  the  algorithm  progresses, 
the  constraints  Ci  become  more  explidt  in  the  sense  that 

lm(explicit(CQ)),  lm{€xpIicit{C\ )),...,  /n»(exp/tc»t(C,)), . . . 

is  an  increasing  sequence  of  interpretations  that  converges  towards  /m(Co). 
When  lm{explicit{Ci))  reaches  /m(Co),  the  algorithm  terminates  and  outputs 
explicit(Ci). 

The  process  of  constructing  C,-  from  Ci-i  is  defined  by  a  collection  of 
transformations.  These  transformations  fall  into  two  categories.  First  there 
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are  simplification  transformations,  wluch  simplify  expressions  involving  set 
operators  to  make  the  constraints  more  explicit.  Such  transformations  are 
responsible  for  augmenting  a  constrsdnt  such  as  2  /(j)(/(c))  with  the 
more  explicit,  but  equivalent  constraint.  A'  D  c.  Second,  there  are  substitu¬ 
tion  transformations,  and  these  are  responsible  for  performing  substitutions 
so  that  simplification  transformations  can  be  applicable.  For  example,  con¬ 
sider  the  constraints  2  /(ijw.  y  2  /(c).  Here  y  D  f(c)  must  be 
substituted  into  X  D  to  obtain  X  3  before  the  projection 

can  be  simplified.  Note  that  substitutions  can  be  cyclic  in  the  sense  that 
substituting  y  3  f{y)  into  X  3  /^7)(^)  involves  replacing  y  by  f(y),  3deld- 

ing  the  constraint  X  3  /(^)(/(3^)).  This  means  that  substitutions  must  be 
carefully  controlled  to  ensure  termination.  Moreover,  they  must  be  done  suf¬ 
ficiently  often  to  ensure  that  simplification  transformations  can  be  applied 
when  they  are  needed. 

To  facilitate  this  tradeoff,  constraints  are  maintained  by  the  algorithm 
in  a  special  form  described  as  follows. 

Definition  16  (Standard  Form)  A  set  expression  is  in  standard  form  if 
it  is  either  atomic  or  of  the  form  op(ai, . . . ,  On)  such  that  each  a,  is  atomic 
and  op  €  OP.  A  collection  of  cortstraints  is  in  standard  form  if  each  con¬ 
straint  is  of  the  form  X  D  se  such  that  se  is  in  standard  form. 

Not  only  does  the  use  of  standard  form  help  control  the  process  of  substitu¬ 
tion,  but  it  also  reduces  the  number  of  cases  that  need  to  be  considered  at 
various  points  in  the  algorithm. 

The  rest  of  this  chapter  is  organized  as  foDows.  First  we  show  how 
constraints  can  be  converted  to  standard  form.  Next,  we  present  the  core 
concepts  of  the  set  constraint  algorithm  in  the  form  of  a  generic  algorithm, 
parameterized  by  set  operators  and  corresponding  transformations.  Ab¬ 
stract  criteria  are  given  for  ensuring  that  a  particular  instance  of  the  generic 
algorithm  (specified  by  giving  the  set  operators  and  transformations)  is  cor¬ 
rect  and  terminates.  The  last  two  sections  of  this  chapter  describe  two 
instances  of  the  generic  algorithm.  The  first  deals  with  projections  and  in¬ 
tersections.  The  second  generalizes  the  first  by  dealing  with  quantified  set 
expressions,  and  is  the  core  algorithm  of  this  thesis.  In  particular,  when 
input  set  constraints  SCp  corresponding  to  a  program  P,  this  algorithm 
outputs  an  explicit  representation  of  lm{SCp). 
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7.3  Converting  Constraints  to  Standard  Form 


Recall  that  standard  form  constraints  are  of  the  form  X  D  se  such  that  se 
is  either  an  atomic  set  expression  or  of  the  form  op(ai,. . .  ,a„)  where  op  is  a 
set  operator  and  each  a,-  is  atomic.  As  an  alternative  characterization,  first 
define  that  the  set  expression  se'  is  a  strict  subexpression  of  the  set  expression 
se  if  se'  is  a  subexpression  of  se  that  is  different  from  se.  Then,  A*  D  se  is  in 
standard  form  if  se  is  either  atomic  or  of  the  form  op(*  *  ■)  such  that  all  strict 
subexpressions  of  se  are  atomic.  Conversely,  if  a  collection  of  constraints  is 
not  in  standard  form  then  it  must  contain  a  constraint  X  D  se  such  that 
either  se  is  (a)  sei  U  sej  or  else  (b)  se  has  a  strict  subexpression  sens  that  is 
not  atomic.  In  case  (b),  we  call  the  occurrence  of  the  subexpression  sen,  in 
se  a  non-standard  occurrence.  For  example,  the  constraint  X  D  f{opi(c))  U 
opi(op2(X))  has  one  occurrence  of  U  and  two  non-standard  occurrences. 

Constraints  can  be  converted  to  standard  form  by  incrementally  remov¬ 
ing  U  symbols  and  non-standard  occurrences  using  the  following  two  rewrite 
steps: 

(1)  Replace  the  constraint  .V  D  sei  U  se2  by  the  two  constraints  X  D  sej 
and  X  D  se2. 

(2)  If  se  has  a  non-standard  occurrence  sent,  then  replace  the  constraint 
/V  2  sc  by  the  two  constraints  X  D  se',  Z  D  scn*  where  Z  is  &  new 
set  variable  and  se'  is  the  result  of  replacing  the  occurrence  of  sens  in 
sc  by  Z. 

Each  step,  if  applicable,  rewrites  constraints  into  a  form  that  is  closer  to 
standard  form  in  the  sense  that  the  number  of  union  symbols  or  the  number 
of  non-standard  occurrences  decreases.  It  follows  that  the  repeated  applica¬ 
tion  of  these  steps  must  terminate.  When  C  is  a  collection  of  set  constraints, 
let  standardize(C)  denote  the  result  of  exhaustively  applying  steps  (1)  and 
(2).  We  now  show  that  standardize(C)  produces  a  standard  form  collec¬ 
tion  of  constraints  that  essentially  has  the  same  least  model  as  C.  A  formal 
statement  of  this  must  take  into  account  the  fact  that  the  standardize 
may  introduce  new  variables.  Hence,  the  preservation  of  the  least  model  is 
defined  with  respect  to  a  set  of  variables.  Specifically,  where  X  and  J'  are 
interpretations  and  var  is  a  collection  of  variables,  define  that  I  =var  T'  if 
X(X)  =  I'(X)  for  each  X  €  var.  Then, 
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Proposition  21  (Standardize)  LetC  be  a  collection  of  constraints.  Then 
standaRDIZE(C)  is  a  standard  form  collection  of  constraints  such  that 

/m(STANDARDI2E(C))  =„or(C)  lm(C). 

Proof:  The  proposition  follows  from  repeated  application  of  the  following 
fact: 

if  C*  is  obtained  from  C  by  an  application  of  step  (1)  or  (2) 
then  /m(C)  =var(C) 

To  prove  this  fact,  first  consider  the  case  where  C  is  obtained  by  an  ap¬ 
plication  of  step  (1).  The  difference  between  C  and  C'  is  that  a  constraint 
/T  D  sei  U  sc2  has  been  replace  by  two  constraints  X  D  sei  and  X  D  se2. 
However,  it  is  clear  that  for  all  interpretations  I 

I  [=  3  sci  U  SC2  iff  J  ^  A'  3  sei  A  A"  3  Use2 

and  it  follows  that  1  iS X\=C\  Hence  lm{C)  =  lm(C*). 

In  the  remaining  case,  C*  is  obtained  from  C  by  an  application  of  step  (2). 
The  difference  between  C  and  C*  is  that  a  constraint  A'  3  se  in  C  is  replaced 
by  two  constraints  X  3  se'  and  Z  3  sens  in  C'  where  sens  is  an  occurrence 
of  an  expression  in  se  and  se*  is  the  result  of  replacing  this  occurrence  by 
the  new  variable  Z.  It  remains  to  prove  that  lm{C){y)  =  lm(C')(y)  for  all 
y  G  var(C)  and  this  is  done  in  two  parts. 

The  first  part  shows  that  lm(C)(y)  3  lm(C'Xy)  for  all  y  €  var(C). 
Let  X  be  the  interpretation  that  maps  Z  into  lm(C)(sens)  and  agrees  with 
lm(C)  on  all  other  set  variables.  By  definition,  X(Z)  =  X(sens),  and  from 
proposition  16  it  follows  that  X(se)  =  X(se').  Since  C  does  not  contain  the 
set  variable  Z  and  X  agrees  with  lm(C)  except  on  it  must  be  the  case 
that  X  ^  C.  Now,  X  D  se  appears  in  C,  and  so  X(X)  3  X(se)  =  X(se'). 
In  summary,  I  is  a  model  of  Z  D  sens  ^  2  ^c',  and  moreover,  I  is  a 
model  of  all  other  constraints  in  C  since  J  ^  C.  Thus  X  C .  This  implies 
that  X  3  lm{C'),  and  so  lm{C){y)  =  X{y)  3  lm(C')(y)  for  all  3^  €  t7ar(C). 
This  completes  the  proof  of  the  first  part. 

The  second  part  shows  that  lm{C){y)  C  lm(C*)(y).  This  is  proved  by 
showing  that  lm(C)  is  a  model  of  C.  To  prove  this,  first  note  that  Z  3  sens 
is  the  only  lower  bound  for  Z  in  C',  and  so  Proposition  18  implies  that 
lm(C')(Z)  =  lm{C'){sens)’  Proposition  16  can  now  be  applied  to  show  that 
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lm(C'){se')  =  lm(C*){se).  Moreover,  since  lm{C')  is  a  model  of  X  D  se', 
it  follows  that  lm{C')  is  a  model  of  /V  D  se.  Finally,  lm[C')  is  a  model  of 
C  —  {X  D  sc}  because  it  is  a  model  of  C'.  Hence  lm(C*)  is  a  model  of  C. 
This  means  that  lm(C)  C  lm(C*),  and  this  completes  the  proof  of  the  second 
part.  [] 

7.4  The  Generic  Algorithm 


We  present  a  high  level  description  of  solving  set  constraints  in  the  form 
of  a  generic  set  constraint  algorithm.  The  reason  for  doing  this  is  twofold. 
First,  the  details  of  the  set  constraint  algorithm  are  substantial,  and  so  the 
generic  algorithm  provides  a  way  to  explain  the  central  ideas  of  the  algorithm 
without  introducing  the  many  details  that  are  necessary  for  its  complete 
description.  Second,  the  general  structure  of  the  algorithm  appears  to  have 
wider  application  than  the  set  constraints  solved  in  this  thesis.  In  particular, 
the  set  based  analysis  of  a  program  involves  writing  set  constraints  using 
set  operators  corresponding  to  the  semantic  operations  of  the  language  - 
different  languages  require  different  set  operations.  The  generic  algorithm 
is  an  attempt  to  distill  the  concepts  that  are  likely  to  be  useful  during  the 
development  of  algorithms  for  the  set  operations  arising  in  future  work  on 
set  based  analysis. 

The  generic  algorithm  is  parameterized  by  a  set  OP  of  set  operations 
that  defines  the  class  of  set  constraints  on  which  it  computes,  and  a  set  A 
of  transformations,  which  define  how  these  constraints  may  be  simplified. 
The  computation  of  the  generic  algorithm  may  be  characterized  as  follows. 
Starting  with  an  input  collection  of  standard  form  constraints  Co,  the  algo¬ 
rithm  constructs  a  sequence  of  standard  constraints  Co,Ci,...,C,-,...  such 
that  (i)  for  each  t,  lm(Ci)  =vor(Co)  (u)  ®ach  C,  is  obtained  from  its 

predecessor  C,--i  through  the  application  of  one  of  the  transformations  in  A, 
and  (iii)  each  C,  is  more  explicit  than  its  predecessor  in  the  sense  that 

lm{€xplicit(Co),lm{explicit{Ci), . . . ,  lm{explicit{Ci)), . . . 

is  an  increasing  sequence  of  interpretations  that  converges  towards  /m(Co). 
The  algorithm  terminates  when  lm(explicit{Ci))  reaches  Im(Co)- 

A  transformation  is  a  function  that  maps  from  and  into  (finite)  collec- 
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input  a  collection  Cq  of  set  constraints  in  standard  form; 
i  :=  0; 

while  S(Ci)  ^  Ci  for  some  transformation  ^  in  A  do 
Ci+i  :=  ^(C.)UCi 
t  :=  i  +  1; 

output  the  collection  of  explicit  form  constraints  explicit(Ci); 
Figure  7.1:  The  Generic  Algorithm 


tions  of  set  constraints.  The  intention  is  that  S(C)  denotes  the  constraints 
that  should  be  added  to  C  according  to  6.  An  instance  of  the  generic  algo¬ 
rithm  is  defined  by  specifying  a  collection  A  of  such  transformations.  These 
transformations  are  exhaustively  applied  as  outlined  in  Figure  7.1. 

We  now  address  general  conditions  for  establishing  the  correctness  of  the 
generic  algorithm.  First,  each  transformation  must  preserve  the  least  model 
of  the  constraints.  Since  a  transformation  may  introduce  new  variables, 
the  preservation  of  the  least  model  must  he  specified  with  respect  to  the 
initial  variables  in  Co-  Recall  that  J  =vor  denotes  that  J(A')  =  J'(^)  for 
each  X  €  var.  Using  this  notation,  the  required  preservation  of  least  model 
may  be  stated  as  lm{Ci)  =var(Co)  for  each  i.  This  condition  can  be 

established  if  the  transformations  are  sound  in  the  following  sense. 


Definition  17  (Transformation  Soundness)  A  collection  of  transfor¬ 
mations  A  is  sound  on  constraints  C  if  lm(C)  =,ar(C)  l‘fn{C  U  S{C))  for  all 
transformations  6  in  A.  [] 

It  is  easy  to  prove  that  if  A  is  sound  on  each  C,-  constructed  by  the  algorithm, 
then  the  least  model  is  preserved. 


Lemma  9  (Least  Model)  If  A  is  sound  on  each  Ci  constructed  by  the 
generic  algorithm,  then  lm(Ci)  =vor(Co)  ^^(Co)  for  each  Ci. 

Proof;  Since  A  is  sound  on  each  Ci,  it  follows  that  /m(Cj)  =„ar(Ci)  ImiCi+i). 
Now  clearly  var{Co)  Q  var(Ci)  for  each  C,-,  and  so  lm(Ci)  =t>or(Co) 

The  lemma  then  follows  by  chaining  these  facts  together.  [] 
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We  now  address  termination.  Although  termination  proofs  tend  to  be 
specific  to  a  particular  instance  of  the  algorithm,  some  general  observations 
can  be  made.  An  important  part  of  proving  termination  involves  establishing 
a  bound  on  the  number  of  atomic  set  expressions  that  can  appear  during  the 
generic  algorithms  execution.  Where  5  is  a  set  of  atomic  set  expressions,  let 
atomic{S)  denote  the  superset  of  S  that  consists  of  all  atomic  set  expressions 
that  are  subexpressions  of  elements  of  S.  Also,  where  C  is  a  collection  of 
standard  form  constraints,  define  that  atomic(C)  is  the  union  of  the  following 
sets: 


•  atomtc({A',ai,. . . ,a„})  for  all  constraints  X  D  op(ai,...,a„)  appear¬ 
ing  in  C,  and 

•  atomic({X,a})  for  all  constraints  X  D  a  appearing  in  C. 


Now,  a  collection  of  transformations  A  is  atomically  hounded  if,  for  each 
collection  Co,  there  is  a  finite  bound  V(Co)  such  that  when  the  generic  al¬ 
gorithm  is  input  Co,  the  cardinality  of  atomic{Ct)  is  bounded  by  V(Co)  for 
each  i.  This  bound  is  proved  by  establishing  an  algorithm  invariant  about 
the  possible  atomic  set  expressions  that  can  appear  during  execution. 

Typically,  the  remaining  part  of  the  termination  proof  involves  using  the 
bound  on  atomic  set  expressions  to  provide  a  bound  on  the  number  of  all 
set  expressions  that  can  appear  during  execution.  The  main  difficulty  here 
is  that  the  set  of  operations  OP  may  not  be  finite. 

Thus  far,  we  have  discussed  termination  and  also  conditions  under  which 
the  least  model  is  preserved  by  each  step  of  the  algorithm.  Now,  the  algo¬ 
rithm  does  not  output  the  final  collection  Cj,  but  rather  explicit(Ci),  and  so 
it  remains  to  show  that  explicit{Ci)  in  fact  describes  lm{Ci).  This  require¬ 
ment  is  essentially  a  completeness  requirement  on  the  transformations  A. 
In  other  words,  we  need  to  show  that  the  transformations  are  sufficiently 
powerful  that  when  no  new  constraints  are  produced,  all  the  information 
about  the  least  model  of  C,-  is  in  explicit(Ci),  More  formally,  define  that 
a  collection  of  transformations  A  is  complete  if,  whenever  the  exhaustive 
application  of  A  terminates,  the  resulting  Ci  constructed  by  the  algorithm 
is  such  that  lm(Ci)  =  lm{explicit{Ci)).  The  following  lemma  immediately 
follows  from  these  definitions. 
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Lemma  10  (Completeness)  If  A  is  complete  and  the  generic  algorithm 
terminates  after  i  iterations  with  output  C,  then  lm(C)  =  lm(Ci).  Q 

Finally,  on  combining  all  of  these  observations: 

Theorem  7  (Correctness  of  Generic  Algorithm) 

Let  A  be  a  collection  of  transformations  that  is  complete.  Let  Cq  be  a  col¬ 
lection  of  standard  form  constraints  and  suppose  that  the  instance  of  the 
generic  algorithm  defined  by  A  terminates  on  input  Co  and  outputs  explicit 
form  constraints  Cout-  If  A  is  sound  on  each  Ci  constructed  by  the  algorithm 
then  lm(C0ut')  “var(C)  Irn^C'^. 

Proof:  Let  Ci  denote  the  collection  of  constraints  constructed  during  the 
last  iteration  of  the  while-do  loop.  By  Lemma  9,  lm(Ci)  (C)  lm{C). 
Finally,  by  lemma  10,  lm[C')  =  lm(explicit{Ci))  =  lm{Ci),  and  it  follows 
that  /m(C')  =„ar(C)  lm(C).  [] 

7.5  Intersection  and  Projection 


We  now  describe  an  instance  of  the  generic  algorithm  for  solving  set  con¬ 
straints  involving  projection  and  intersection.  Specifically,  the  collection  of 
operators  for  these  set  constraints,  denoted  OPi,  shall  consist  of  projections 
fr}  where  /  is  an  n-ary  symbol  from  E  and  1  <  t  <  n,  and  intersections  n„ 
where  n  >  2.  We  recall  that  the  subscripts  in  intersections  shall  usually  be 
omitted  and  an  expression  n„(sei , . . . ,  se^)  shall  be  written  as  sei  n  •  •  •  D  se„ . 
For  simplicity,  we  shall  omit  consideration  of  the  constant  symbol  T  from 
this  section.  Figure  7.2  gives  an  example  collection  of  set  constraints  involv¬ 
ing  intersection  and  projection,  along  with  the  least  model  of  the  constraints. 
In  this  example,  /  and  c  are  function  symbols  of  arity  1  and  0  respectively, 
and  /”  is  again  used  to  abbreviate  n  applications  of  /. 

Now,  the  generic  algorithm  works  on  standard  form  constraints,  and  so 
the  first  step  in  solving  these  equations  is  to  put  them  into  standard  form 
using  the  algorithm  standardize.  The  resulting  standard  form  constraints 
appear  in  figure  7.3;  let  Co  denote  these  constraints. 

Recall  that  explicit  form  constraints  are  of  the  form  X  D  c  such  that 
c  is  a  non-variable  atomic  set  expression.  This  means  that  the  constraints 
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X  D  fifiX))  U  c  A-  {cJHc),f\c),f{c),...} 

y  2  mm))  u  c  y  ^  {ejHc),ric),nc),->-} 

z  2  {Xoy)  u  /(-;(y)  Z  ^  {c,/2(c),/5(c)/6(c),/8(c),...} 

Figure  7.2:  Example  Constraints  and  Their  Least  Model 

X  2  mx))  X  2  c 

y  2  fimy)))  y  ^ 

z  2  (Xny)  z  2  /(i;(y) 

Figure  7.3:  Constraints  from  Figure  7.2  Rewritten  in  Standard  Form 

X  D  /(/(A’)),  X  2  C,  y  2  fififiX)))  and  y  D  c  in  Co  are  already  in 
explicit  form.  However  the  constradnts  Z  2  (X  oy)  and  Z  2  are 

not  in  explicit  form  and,  moreover,  lm(explicit{Co))  ^  lm{Co).  Hence  the 
explicit  constraints  in  Co  do  not  characterize  the  least  model  of  Co. 

As  a  first  step  towards  making  these  constraints  more  explicit,  consider 
Z  2  /(7)(^)*  Now,  the  constraints  for  y  are  3^  2  c  and  y  2  fififiy)))- 
The  first  says  that  y  must  contain  c  and  the  second  says  that  y  must 
contain  all  terms  of  the  form  /(/(/(y)))  such  that  y  €  y.  Combining  Z  2 
/(^)(y)  with  y  2  c  implies  that  Z  must  contain  /w’(').  but  this  is  simply 
the  empty  set,  and  can  be  ignored.  Combining  Z  2  /(I)  O')  with  y  2 
/(/(/(y)))  implies  that  Z  must  contain  /f'^j(/(/(/(y))))  for  each  y  E  y, 
and  this  reduces  to  f(f(y))  for  each  y  €  y^  This  can  be  expressed  as 
the  constraint  Z  D  /(/(J^)).  In  essence,  by  substituting  the  explicit  form 
constraints  for  y  into  the  right  hand  side  of  Z  2  simplify 

the  resulting  expression,  we  have  obtained  a  new  explicit  form  constraint 
for  Z.  In  so  doing,  we  have  made  the  constraints  more  explicit  in  the  sense 
that  the  explicit  form  constraints  now  contain  more  information  about  the 
least  model  of  the  constraints. 

Now  consider  the  constraint  Z  2  {X  H  y).  Since  A'  2  c  and  3^  2  it  is 
clear  that  both  X  and  y  must  contain  c,  and  so  Z  must  contain  c.  This  can 
be  expressed  by  the  constraint  Z  2  However  there  are  other  constraints 
for  X  and  y.  Consider  X  2  i{.i{.X))  and  y  D  /(/(/(J')))-  These  constraint 
imply  that  Z  2  /(/(-^))n/(/(/(3'))).  The  expression  /(/(A'))n/(/(/(y))) 
can  be  simplified  into  /(f(X)n  f(/(y)))  and  then  further  simplified  into 
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/(/(A*  n  f(y))),  and  so  the  constraint 

ZDf{f{xnf{y))) 

is  obtained.  However  this  constraint  is  not  in  stamdard  form.  It  cam  be 
made  into  standard  form  by  introducing  a  new  variable,  say  V,  and  writing 
2  D  /(/(V))  amd  V  3  A'  D  f(y).  In  other  words,  substituting  explicit  form 
constraints  for  X  amd  y  (namdy  X  D  /(/(A'))  and  y  3  fififiy))))  iiito 
2  D  {X  C\y)  amd  simplifying  leads  to  new  constraints  2  3  /(/(V))  and 
V  3  A"  n  f{y).  The  first  is  in  explicit  form,  but  the  second  is  not.  In  short, 
the  substitution  and  simplification  process  has  maide  the  constraints  more 
explicit.  However,  in  the  process  we  have  introduced  another  constraint  and 
so  we  must  repeat  the  steps  just  outlined  to  simplify  this  new  constraint. 

In  essence  this  process  of  substitution  using  explicit  form  constraints 
and  then  simplifying  the  resulting  expressions  forms  the  baisis  of  the  set 
constraint  algorithms.  However  there  are  two  difficulties.  First  the  sub¬ 
stitution  process  must  be  cau'efuUy  controlled.  For  example,  consider  the 
constraint  X  3  /(A').  Now,  this  constraint  could  be  substituted  into  itself 
to  obtain  X  3  /(/(A'))  and  subsequent  substitution  steps  could  lead  to 
X  3  f^(X),  X  3  f\X)  etc.  Qearly  this  process  can  continue  for  ever.  To 
prevent  this,  substitution  must  be  restricted  so  that  it  does  not  increase  the 
size  of  atomic  set  expressions.  For  example,  substitutions  into  expressions 
such  as  A'  n  and  /(7)  (^)  allowed,  but  substitutions  into  expressions 
such  as  f{y)  or  f{f{X))  are  not  allowed.  Importantly,  this  restricted  form 
of  substitution  is  sufficient. 

The  second  difficulty  is  that  the  operation  of  intersection  produces  new 
constraints  that  must  be  simplified.  In  particular  it  introduces  new  variables. 
The  introduction  of  these  new  variables  must  be  controlled  using  a  special 
naming  scheme  to  ensure  that  the  algorithm  terminates. 


Transformations 

We  now  give  the  details  of  the  transformations.  Each  transformation  takes  a 
collection  of  standard  form  constraints  C  as  input,  and  outputs  one  or  more 
constraints.  The  first  two  transformations  deal  with  substitution,  and  the 
third  deals  with  simplifying  projections. 


182 


CHAPTER  7.  SOLVING  SET  CONSTRAINTS 


Transformation  1  (Op-Substitution)  IfC  contains  the  two  constraints 
X  D  and  y  D  a  where  a  is  atomic,  then 

output  X  2  op(oi , . . . ,  fli-i ,  a,  a,+i , . . . ,  a„).  [] 

Transformation  2  (Var-Substitution)  IfC  contains  X  Dy  and  y  D  a 
where  a  is  atomic,  then  output  A*  2  a.  [] 


Transformation  3  (Projection)  IfC  contains  X  2  /(7)^(/(®i»‘-*»®n)) 
and  lm{explici1(C))(aj)  ^  {}  for  each  j  ^  i,  then  output  X  D  oi.  [] 

Note  the  condition  lm{explicH(C)){aj)  ^  {}  in  the  projection  transforma¬ 
tion.  To  see  why  this  is  needed,  suppose  that  C  consists  of  the  single  con¬ 
straint  X  2  /(2)(/(3^i<^))'  model  of  this  constraint  maps  both  X 

and  y  into  the  empty  set.  However,  if  the  side  condition  is  not  present  on 
the  projection  transformation,  then  the  constraint  X  D  c  would  be  added 
and  this  would  alter  the  least  model. 

The  final  transformation  deals  with  simplifying  intersection.  As  noted 
before,  one  difficulty  with  simplifying  intersections  is  that  new  variables 
need  to  be  introduced,  and  this  leads  to  termination  problems.  To  over¬ 
come  this  problem,  a  special  naming  scheme  is  used  for  these  new  variables. 
Specifically,  the  introduced  variables  are  of  the  form  Vjv  where  iV  is  a  finite 
subset  of  the  atomic  set  expressions  that  appear  in  Co  (the  input  collection 
of  constraints).  Call  such  variables  intersection  variables.  The  intention  is 
that  an  intersection  variable  should  be  equivalent  to  the  expres¬ 

sion  ui  n  ’  •  •  n  0,1.  All  of  the  variables  of  form  Vjv  are  assumed  to  be  distinct 
variables  that  do  not  appear  in  the  input  constraints  Cq.  Define  a  function 
JV  that  maps  from  such  a  variable  back  into  the  atomic  set  expressions  it 
represents: 

\r(  ^  4*^  /  -^  if  o  is  the  intersection  variable  Vn- 
—  I  otherwise. 

The  intersection  transformation  can  now  be  defined. 


Transformation  4  (Intersection)  If  C  contains  A'  2  H  •  •  •  D  such 
that  for  some  /  €  E,  each  a,  is  of  the  form  /(a<,i,...,o,,,j),  then  let  Nj  = 
U.=:i..m  <^(®«  j)>  j  =  output  the  constraints 
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•  Vjv,  =  Oij  n  •  •  •  n  Cnj,  for  each  Vnj  i  t}ar{C).  Q 

Strictly  speaking,  this  transformation  takes  the  input  constraints  Co  as  an 
implicit  parameter  since  the  treatment  of  the  variables  Vjv  requires  knowl¬ 
edge  about  Co,  however  for  notational  convenience  this  extra  parameter  shall 
be  suppressed.  We  note  that  the  treatment  of  the  intersection  variables  Vn 
is  analogous  to  the  subset  construction  used  in  the  conversion  of  a  non- 
deterministic  finite  state  automaton  to  a  deterministic  finite  state  automa¬ 
ton. 

Let  Ai  denote  Transformations  1-4.  (Strictly  speaking.  Transformations 
1-4  are  schemas  for  transformations,  and  Ai  consists  of  all  instances  of  these 
four  transformations  schemas.  We  shall  frequently  blur  this  distinction.) 
Now,  define  that  the  intersection-projection  algorithm  inputs  constraints  C 
and  first  converts  C  into  standard  form  constraints  Cq  and  then  exhaustively 
applies  the  transformations  Ai  to  Cq  as  outlined  by  the  generic  algorithm. 
As  an  example  of  the  execution  of  the  algorithm,  recall  the  example  set 
constraints  given  in  Figure  7.2.  The  execution  of  the  algorithm  on  these 
constraints  is  traced  in  Figure  7.4  (the  original  constr^ts,  in  standard 
form,  appear  in  the  left  hand  column).  In  this  example  execution,  we  have 
given  preference  to  lower  numbered  transformations  when  more  than  one 
transformation  is  applicable.  The  explicit  form  constraints  are  marked  with 
an  asterisk,  and  each  new  constraint  is  marked  with  either  (D,(D,(3)  or  0  to 
indicate  the  transformation  used.  We  remark  that,  for  efficiency  reasons,  it 
is  appropriate  to  remove  duplicate  expressions  when  adding  new  constraints 
involving  intersection.  For  example,  instead  of  adding  Z  D  c  D  c,  one  could 
immediately  simplify  this  into  Z  Dc.  However,  in  this  section  we  shall  strive 
for  a  simple  presentation  of  the  algorithm,  and  so  a  number  of  straightfor¬ 
ward  modifications  relating  to  efficiency,  such  as  this  one  involving  deletion 
of  duplicate  expressions  in  intersections,  shall  be  omitted. 


Correctness 

The  proof  of  the  correctness  of  this  algorithm  follows  the  outline  given  for  the 
generic  algorithm  in  the  previous  section.  We  begin  by  proving  an  important 
invariant  of  the  execution  of  the  algorithm. 
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Z  2  (D 

2  2  (D 

Z  O  cHc  (I) 

z^cnfififiy)))  (D 

ZDf{f{X))nc  (1) 

2  3 /(/(A'))  n /(/(/(37)))  @ 

2  2  my))*  (D 

2  2  c*  (4) 

2  2  /(%(Ar),/(/(y))})*  ® 

^{/(^)./(/(y))}  2  /(^)  n  fifiy))  @ 

V{/(Ar),/(/(y))}  2  fiV{x,fiy)})*  @ 

V{^./(y))2A'n/(y)  @ 

5  ®  ^  f(y)  ® 

^  V{x.ny)}  2  /(/(^))  n  /(d;)  ® 

^  2  fifi^))*  2  f(y{nx),y})*  0 

^  2  c*  2  m)  n  y  0 

y  2  mf(y)))*  V{/(Ar).y}  2  /(A')  n  c  0) 

2  2  /(i;(3^)  V^f^x),y)  2  /(A')  n  /(/(/(y)))  0 

2  2  at  n  3^  ^{/(;«r).y}  2  /(V{^,/(/(y))})*  0 

^{-v,/(/(y))}  2  at  n  fifiy))  0 

y{x,f(j(y))}  2  c  n  fifiy))  0 

V{Ar,/(/(y))}  2  /(/(AT))  n  /(/(3^))  0 

'^{-v./(/(y))}  2  fiy{HX),f(y)})*  0 

V{/(^))./(y)}  2  /(A')  n  /(3^)  0 

'^{/(-v)),/(y)}  2  /(V{^,y})*  0 

V{^,y}2A'ny  0 

V{A',y}  2  c  n  c  0 

Vw}2cn/(/(/(y)))  0 

V{^,y}2/(/(A’))nc  0 

V{x^)  2  /(/(A'))  n  /(/(/(3^)))  0 
V{;t'.y}  2  c*  0 

2  /(V{/(^),/(/(y))})*  0 


Figure  7.4:  Example  Algorithm  Execution 
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Invariant  1  (Atomic  Set  Expression  Invariant)  A  collection  of  con¬ 
straints  C  satisfies  the  atomic  set  expression  invariant  if  each  element  of 
atomic{C)  either 


•  appears  in  atomic(Co),  or 

•  is  an  intersection  variable  or  of  the  form  /(Ai,. . . ,  A'„)  such  that 
Xi,...,Xn  ore  intersection  variables  and  f  is  a  function  symbol  ap¬ 
pearing  in  Cq.  [] 

Proposition  22  Each  Ci  constructed  by  the  algorithm  satisfies  the  atomic 
set  expression  invariant. 

Proof:  Clearly  Cq  satisfies  the  atomic  set  expression  invariant.  Now  sup¬ 
pose  that  C,-_i  satisfies  this  invariant.  The  constraints  C,-  are  defined  to  be 
If  5  is  either  Transformation  1,  2  or  3,  then  the  only  difference 
between  C,-  and  Cj_i  is  that  0%  contains  a  new  constraint  of  the  form  X  ^  a 
such  that  a  is  an  atomic  set  expression,  and  moreover  a  appears  in  C,_i.  It 
follows  that  atomiciCi)  C  atomic{Ci-\),  and  so  Ci  satisfies  the  atomic  set 
expression  invariant. 

The  remaining  case  is  where  S  is  Transformation  4.  In  this  case,  the  dif¬ 
ference  between  Ci  and  C,-_i  is  that  C,-  contains  a  number  of  new  constraints. 
These  new  constraints  are  either  of  the  form  A'  3  oi  D  •  •  •  n  On  where  the  Oj 
are  atomic  set  expressions  that  appear  in  C,_i,  ot  X  D  f{Xi,. . . ,  A’„)  where 
the  Xi  are  intersection  variables.  Again  C,-  satisfies  the  atomic  set  expression 
invariant.  [] 

The  atomic  set  expression  invariant  proves  two  things.  First,  it  verifies 
that  the  specification  of  Transformation  4  and  the  use  of  the  naming  scheme 
Vn  is  consistent.  In  particular,  it  shows  that  elements  of  the  sets  Nj  con¬ 
structed  in  Transformation  4  are  either  atomic  set  expressions  from  Co  or  else 
atomic  set  expressions  that  appear  in  some  set  N  such  that  V/v  is  a  previously 
constructed  intersection  variable.  It  follows  that  all  variables  introduced  by 
this  transformation  are  of  the  form  V;v  such  that  N  C  atomic(Co). 

Second,  it  shows  that  Ai  is  atomically  bounded.  This  is  because  the 
set  atomic(Co)  is  finite  and  so  there  are  only  a  finite  number  of  variables 
of  the  form  Vtv  such  that  N  C  atomic{Co).  It  follows  that  there  are  only 
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a  Anite  number  of  expressions  of  the  form  where  each  Xn  is 

an  intersection  variable.  Hence,  the  cardinality  of  each  C,-  is  bounded  by 
K  +  where  K  is  the  cardinality  of  Co,  F  is  the  number  of  function 

symbols  appearing  in  Co,  and  n  is  the  maximum  arity  of  function  symbols 
appearing  in  Co. 

We  next  address  the  soundness  of  Ai.  The  soundness  of  Transformations 
1,  2  and  3  is  straightforward. 

Proposition  23  lm(C)  =  lm(C  U  ^(C))  where  6  is  one  of  Transformations 
1-3. 


Proof;  Let  S  be  one  of  Transformations  1-3.  It  is  sufficient  to  prove  that 
lm(C)  is  a  model  of  ^(C).  Let  I  be  lm(C),  and  first  consider  Transformations 
1  and  2.  In  this  case  C  contains  X  D  se  and  3^  3  a,  and  ^(C)  contains  the 
constraint  X  3  se*  such  that  se*  is  the  result  of  replacing  an  occurrence  of 
3^  in  se  by  a.  Since  J  is  a  model  of  C,  it  follows  that  J(3^)  3  J(a)  and 
I{X)  3  J(se).  Hence,  by  Proposition  19,  Z(se)  3  I(se'),  and  it  follows  that 
2  T{se').  This  completes  the  proof  for  this  case. 

In  the  case  of  Transformation  3,  C  contains  X  3  //7)^(/(ai,.. . ,a„)) 
and  X  3  a,-  is  output.  The  proof  proceeds  by  showing  tnat  J  is  a  modd 
of  /Y  3  Oj.  Let  t7i  €  J(a,).  Now,  since  explicit{C)  C  C,  it  foUows  that 
lm{explicit(C))  C  I.  Hence  if  lm{explicit{C))(aj)  is  non-empty  for  each 
j  ^  t,  then  it  must  be  the  case  that  for  each  Oj,  j  ^  i,  there  exists  a 
Vj  €  It  follows  that  /(vi,...,r„)  €  I(/(ai,...,a,i)),  and  so  w,-  € 

2'(/(7)H/(ai.”->®n)))-  But  I  satisfies  X  3  /(7)*(/(oi,... ,a„)),  and  so  v,  G 
I(X).  This  completes  the  proof  that  I  is  a  model  of  X  D  a,-.  [] 

The  correctness  of  Transformation  4  is  somewhat  more  difficult  to  prove 
since  it  does  not  in  general  preserve  the  least  model  of  the  constraints.  This 
is  because  the  correctness  of  this  transformation  relies  on  properties  of  the 
intersection  variables  For  example,  consider  the  constraints 

XDfiX)nf{c)  and  3  c. 

The  intersection  transformation  adds  the  constraints  X  3  f{V^x,e}) 
F{;r,e}  3  A'  n  c.  Clearly  this  does  not  preserve  the  least  model  of  the  con¬ 
straints:  before  the  transformation  the  least  model  maps  X  into  the  empty 
set,  and  after  the  transformation  the  least  model  maps  X  into  {/(c)}. 
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Instead,  we  must  first  establish  some  properties  of  the  intersection  vari¬ 
ables  introduced  by  the  algorithm.  Where  IV  is  a  finite  set  of  atomic  set 
expressions  {oi, . . . ,  a„},  let  (f)  N)  denote  ci  n  •  •  •  n  a„.  Now,  consider  the 
following  invariant. 

Invariant  2  (Intersection  Variable  Invariant)  A  collection  of  con¬ 
straints  C  satisfies  the  intersection  variable  invariant  ifVfj  6  var{C)  implies 
tkatlm(C)^VN  =  (C\N).  Q 

It  is  straightforward  to  verify  that  this  invariant  is  preserved  by  Transfor¬ 
mations  1,  2  and  3  because  these  transformation  do  not  introduce  any  new 
intersection  variables.  Specifically: 


Proposition  2A  If  C  satisfies  the  intersection  variable  invariant  then  C  U 
6(C)  satisfies  the  intersection  variable  invariant  where  6  is  one  of  Transfor¬ 
mations  1-3. 


Proof:  Let  C  be  a  collection  of  constraints  that  satisfy  the  intersection 
variable  invariant  and  let  6  be  one  of  Transformations  1-3.  Now,  if  is  an 
intersection  variable  in  Cu6(C)  then  by  the  definition  of  Tl’ansformations  1,  2 
and  3,  it  is  clear  that  V/v  must  appear  in  C.  Since  C  satisfies  the  intersection 
variable  invariant,  lm(C)  ^  Vjv  =  Also,  by  Proposition  23,  lm(C)  = 

lm(C  U  6(C)).  It  is  immediate  that  lm(C  U  6(C))  )=  Va^  =  (f)  N).  [] 

The  next  proposition  deals  with  the  TVansformation  4.  It  not  only  shows 
that  the  transformation  preserves  the  intersection  variable  invariant,  but  also 
that  the  transformation  is  sound  when  applied  to  constraints  that  satisfy  this 
invariant. 


Proposition  25  If  C  satisfies  the  intersection  variable  invariant  and  6  is 
Transformation  4,  then 


(1)  lm(C)  =„„(c)  lm(Cu6(C)),  and 

(2)  C  U  6(C)  satisfies  the  intersection  variable  invariant. 
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Proof:  Recall  that  if  Transformation  4  is  applied,  then  C  contains  a  con¬ 
straint  of  the  form  D  oi  n  •  •  •  n  where,  for  some  /  €  £,  each  a,-  is  of 
the  form  /(a,-,i, . . . ,  Ot,n),  and  S(C)  consists  of  the  constraints 

X  2  and 

Vn,  2  fli  j  n  •  •  •  n  Omj  for  each  that  does  not  appear  in  C 

where  Nj  =  j  =  l-n- 

Th  i  core  part  of  the  proof  shows  that  the  interpretation  I  defined  by 

-  I  J  n  •  •  •  n  Omj)  if  A'  f?  var{C)  and  X  is  Vjv^ 

'  I  lm(C)(X)  otherwise 

is  the  least  model  of  C  U  S(C).  By  definition,  it  is  clear  that  J  is  a  model  of 

To  show  that  I  is  also  a  model  of  A’  D  /(Vjvj, . . . ,  V7v„),  consider  the  follow¬ 
ing  property  of  I  for  each  i  and  j: 


liaj)  =  (7.15) 

If  Oij  is  not  an  intersection  variable,  then  A/'(a,j)  is  just  {oij}  and  so  (7.15) 
is  trivially  true.  On  the  other  hand,  if  Oij  is  an  intersection  variable,  then 
since  it  appears  in  C,  (7.15)  follows  from  the  assumption  that  C  satisfies  the 
intersection  variable  invariant.  Using  equation  (7.15),  the  following  chain  of 
equalities  can  be  established 


2’(aij  n  •••  n  Otoj)  = 


^(n^(aij))n--nl(nAr( 

T  (f)  A/’(aij)  U  •  •  •  U  A/’(  Omj)) 


amd  this  proves  that,  for  all  i,  J{aij  n  *  •  •  n  timj)  =  I(niVj).  Now,  if 
is  introduced  by  tf,  then  liVN^)  =  T(aij  n  ••  •  D  <hnj)  by  definition  of  I.  If 
Vsj  is  not  introduced  by  S,  then  it  appears  in  C  and  so  XiVNj)  =  T(f\Nj) 
because  C  satisfies  the  intersection  variable  invariant.  Combining  these  two 
cases  with  T(aij  n  •  •  •  n  Omj)  =  X{r\Nj)  proves  that,  for  j  =  l..n. 


I(Vjv>)  =  I(n^i)  =  2:(aijn...na„.^) 


(7.16) 
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Using  this  equality,  it  is  easy  to  see  that 

T  • .  •  jfll.n)  n  •  •  •  n  >  ®m,n)) 

=  X  (/(ai,i  n  •  •  •  n  a^.l,  •  • .  ,ai,n  H  •  •  •  n  Om,n)) 

and  since  J  is  a  model  of  X  D  /(oi,i, ai,„)  D  •  •  •  n  f{an,i,-  •  • , Om.n)  it 
follows  that  I  is  a  model  of  X  D  X  This  concludes  the 

proof  that  I  is  a  model  ofCU  S(C),  and  soX  D  lTn(C  U  6(C)). 

X  is  not  only  a  model  of  C  U  6(C),  it  is  in  fact  the  least  model.  To  see 
this,  let  X'  be  an  arbitrary  model  of  C  U  6(C).  If  A*  is  a  variable  that  appears 
in  C,  then  r(X)  D  X(X)  because  T  3  lm(C)  and  X(X)  =  lm(C)(X).  If  A’  is 
one  of  the  variables  Vsj  introduced  at  this  step,  then  consider  the  following 
chain: 

T'C^Nj)  2  T‘(aijr\’-’namj)  3  I(aijn""namj)  =  T(VNj)‘ 

The  first  containment  follows  because  I'  is  a  model  of  5(C).  The  second 
is  because  aij  n  •  •  •  n  Umj  contains  only  variables  from  C,  and  it  has  just 
been  proved  that  X*(X)  3  X(X)  for  variables  X  €  var(C).  The  final  equality 
follows  from  (7.16).  Hence  X'(X)  3  X(X)  for  all  variables  X,  and  so  lm(C  U 
5(C))  3  X.  Combining  this  with  X  3  lm(C  U  5(C))  proves  that  X  =  lm(C  U 
5(C))." 

To  complete  the  proof,  note  that  by  definition  X  agrees  with  /m(C)  on 
var(C),  and  so  /m(C)  =,«r(C)  ^»”(C  U  5(C)).  This  proves  part  (1)  of  the 
proposition.  To  prove  part  (2),  we  need  to  show  that  Z(Vjv)  =  Z(niV)  for 
all  intersection  variables  V^v  appearing  in  C.  If  Vjv  €  var(C),  then  the  fact 
that  C  satisfies  the  intersection  invariant  implies  that 

I(Vn)  =  lm(C)(VN)  =  lm(C)(nN)  =  1(0^) 

On  the  other  hand,  if  V/v  is  introduced  by  5(C),  then  X(Vn)  =  X({\N) 
follows  from  (7.16).  [] 

Combining  these  propositions  proves  the  necessary  soundness  property 
of  Ai. 
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Lemma  11  (Soundness)  Ai  is  sound  on  each  Ci  constructed  by  the  algo¬ 
rithm. 


Proof:  First,  it  is  clear  that  each  Ci  constructed  by  the  algorithm  satisfies 
the  intersection  variable  invariant.  This  is  because  Cq  satisfies  the  invariant 
(since  it  does  not  contain  any  intersection  variables)  and  each  transformation 
preserves  the  invariant  (see  Propositions  24  and  25).  It  remains  to  show  that 
lm(Ci)  =„or(Cj)  U  ^(Ci))  for  each  Ci  and  b.  This  is  easy  since  if  6  is 

Transformation  1,  2  or  3,  then  lTn(Ci)  =  /m(C,U^(C,))  by  Proposition  23,  and 
if  6  is  Transformation  4,  then  lm(Ci)  =var(C  )  lm(Ci  U  S(Ci))  by  Proposition 
25.  0 

As  an  aside,  note  that  the  correctness  of  this  lemma  makes  the  implicit  as¬ 
sumption  that  the  intersection  variables  V/v  introduced  during  the  execution 
of  the  algorithm  are  distinct  from  var(Co).  Clearly  this  can  always  be  done 
by  choosing  the  intersection  variables  introduced  in  Transformations  4  from 
VAR  -  war(Co).  Strictly  speaking,  A  should  be  parameterized  by  var{Co)  to 
denote  this  dependence  on  uar(Co). 

So  far,  we  have  prove  that  Aj  is  atomically  bounded  (this  follows  from 
the  atomic  set  expression  invariant)  and  sound.  It  remains  to  prove  termi¬ 
nation  and  completeness. 


Lemma  12  (Termination)  Let  Cq  be  a  collection  of  constraints  in  stan¬ 
dard  form.  Then  the  instance  of  the  generic  algorithm  defined  by  Ai  termi¬ 
nates  on  Co- 


Proof:  Each  Ci  constructed  by  the  algorithm  is  in  standard  form.  Moreover, 
by  inspection  of  the  transformations,  it  is  dear  that  op  appears  in  C,-  iff  it 
appears  in  Co-  Hence,  each  constraint  in  each  C,-  must  be  of  one  of  the 
following  forms: 


•  X  D  a  where  a  is  an  atomic  set  expression; 

•  X  D  /(7)*(a)  where  a  is  atomic  and  appears  in  Co,  or 

•  X  D  cifl-  •  -ncn  where  Co  contains  an  expression  of  the  form  a\f\-  •  -naj,. 
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Combining  this  with  the  atomic  set  expression  invariant  (proved  in  Propo¬ 
sition  22)  proves  that  there  are  only  a  Unite  number  of  different  constraints 
that  may  be  constructed  by  the  algorithm.  Since  the  collections  Ci  are 
monotonically  increasing  during  the  algorithm’s  execution,  it  follows  that 
for  some  t  >  1,  Cj  =  C,_i,  at  which  point  the  algorithm  terminates  with 
output  explicit(Ci).  [] 

We  now  address  completeness.  The  proof  of  this  is  somewhat  involved 
and  represents  the  core  part  of  the  correctness  of  the  algorithm.  In  essence 
there  is  a  tension  between  completeness  and  termination:  the  transforma¬ 
tions  must  be  applied  sufficiently  often  that  the  least  model  of  the  constraints 
eventually  becomes  explicit,  but  not  so  often  that  they  can  be  applied  in¬ 
finitely  often. 

Lemma  13  (Completeness)  Ai  is  complete. 


Proof:  Let  C  be  the  result  of  exhaustively  applying  Ai  to  a  collection 
of  constraints.  This  implies  that  S(C)  C  C  for  all  transformations  S  in 
Ai.  Adopting  the  notation  of  the  generic  algorithm,  let  the  sequence  of 
constraints  obtained  by  this  exhaustive  application  be  Co,Ci,...,Ci  where 
Ci  =  C.  Let  V  denote  the  subset  of  constraints  in  C  of  form  X  O  a  where  a 
is  a  non-variable  atomic  set  expression.  Clearly  V  C  explicit{C)  C  C,  and  so 
lm{V)  C  lm{explicit(C))  C  lm(C).  The  remainder  of  the  proof  shows  that 
lm{V)  =  lm{C),  and  it  is  clear  that  this  implies  lm{explicit{C))  =  lm(C),  as 
required  by  the  definition  of  completeness. 


Since  lm(V)  C  lm(C),  it  only  remains  to  prove  that  lm{V)  2  l‘m{C),  and 
this  can  be  established  by  showing  that  lm{V)  is  a  model  of  C.  Let  Jp  denote 
lm(V).  Proposition  17  shows  that  v  6  Jv(,X)  iff  there  exists  a  constraint 
X  D  a  mV  such  that  v  €  Ivifi).  Since  V  consists  of  those  constraints  in 
C  that  have  the  form  X  D  a  where  a  is  atomic  and  non-variable,  it  follows 
that 


V  e  Iv{X)  iff 


V  €  2i)(o)  for  some  constraint  X  D  a  in  C 
where  a  is  a  non- variable  atomic  set  expression 


(7.17) 


The  remainder  of  the  proof  uses  this  fact  to  show  that  X-p  is  a  model  of 
C.  Consider  each  possible  constraint  in  C  in  turn: 


Case  (i):  Consider  a  constraint  of  the  form  X  D  a  where  a  is  an  atomic 
set  expression.  First  suppose  that  a  is  not  a  set  variable.  This  means  that 
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X  D  a  appears  in  V  and  so  is  immediate  that  Xp  is  a  model  of  such  a 
constraint.  On  the  other  hand,  suppose  that  a  is  a  set  variable,  say  3^,  and 
let  V  be  a  value  such  that  v  €  Xp(y).  Prom  (7.17)  it  follows  that  there  exists 
a  constraint  3^  D  a'  in  C  where  a'  is  a  non-variable  atomic  set  expression 
such  that  V  €  Xp(a').  So,  C  contains  X  D  y  and  y  D  a',  and  since  the 
application  of  Transformation  2  to  C  does  not  produce  any  new  constraints, 
it  must  be  the  case  that  X  D  a'  already  appears  in  C.  Hence  X  D  a'  is'mV 
and  so  V  €  Xp{X).  This  completes  the  proof  that  Xp  is  a  model  of  A'  3  o. 

Case  (ii):  Consider  a  constraint  of  the  form  X  (a)  where  a  is  an 

atomic  set  expression.  First  suppose  that  a  is  not  a  set  variable  and  let 

V  e  Xp{f^^(a)).  This  means  that  there  exists  a  value  /(vi, .  ..,««)€  Xp{a) 
such  that  Vi  is  v.  Since  a  is  not  a  set  variable  or  T  (recall  that  T  is  omitted 
from  this  section),  it  must  be  the  case  that  a  is  of  the  form  /(ai,...,an) 
where  each  a,  is  an  atomic  set  expression  such  that  v,-  £  Xp(ai).  This 
implies  that  each  a,-  is  non-empty  in  the  least  model  of  explicit{C)  and  so 
the  preconditions  of  Transformation  3  are  satisfied.  Hence  the  constraint 
X  D  a,-  must  already  appear  in  C.  By  case  (i),  Xp  is  a  model  of  X  D  a,-,  and 
it  follows  that  v  €  Xp{X),  and  so  Xp  is  a  model  of  A*  D 

On  the  other  hand,  suppose  that  a  is  a  set  variable,  say  y,  and  let 

V  €  Tp(fj^^{y)).  This  means  that  there  exists  a  value  /(ui, . . . ,  v„)  6  Xp{y) 

such  that  Vi  is  v.  By  (7.17),  there  exists  a  constraint  3^  2  o  such  that  o  is 
a  non- variable  atomic  set  expression  and  /(vi,... ,t>n)  €  Xp{a).  Since  the 
application  of  Transformation  2  to  C  does  not  produce  any  new  constraints, 
it  must  be  the  case  that  X  D  already  appears  in  C.  Clearly  v  = 

Vi  €  2x)(/(7)^(a))-  Moreover,  we  have  just  argued  that  Xp  must  satisfy  such 
a  constraint.  It  follows  that  v  €  Xp{X),  and  this  completes  the  proof  that 
Xp  satisfies  X  D  /(7/(a). 

Case  (iii):  The  final  case  deals  with  constraints  involving  intersection.  Such 
constraints  are  of  the  form  A  2  <*1  ^  "  n  0^  where  each  a,-  is  an  atomic  set 
expression  and  m  >  2.  The  proof  is  by  induction  on  v  and  the  induction 
hypothesis  is:  for  all  values  v  and  for  all  constraints  X  2  oi  H  •  •  *  n  0^1 
m  >  2,  appearing  in  C, 


(a)  V  €  Xp{ai  n  •  •  •  fl  o^)  implies  v  €  Xp(X),  and 
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(b)  if  2  <*1  n  •  •  •  n  Oto  is  introduced  by  an  application  of  Transformation 
4  then  v  G  TviA!)  implies  v  €  TdCoi  n  •  •  •  H  a,„). 

Let  V  be  a  value  such  that  the  induction  hypothesis  holds  for  all  values 
with  fewer  function  symbols  that  v.  Before  considering  (a)  and  (b),  it  is 
convenient  to  first  prove  the  following  statement:  if  v'  has  fewer  symbols 
that  V  and  ai , . . . ,  appear  in  Ci  then 

n'  G  l7)(ai  n  •  •  •  n  OAr)  iff  A  ^  where  JV  =  (J  Af(aj)  (7.18) 

aeJV  j=l..k 

This  is  proved  by  a  secondary  induction  on  i.  Suppose  that  (7.18)  holds  for 
all  i'  <  i.  Let  ai,...,ait  appear  in  Ci  and  consider  the  following  chain  of 
propositions. 

i/  G  Tviai  n  •  •  •  n  Om)  iff  A  ^ 

iff  A  A  ^ 

i=l..m  a^N{aj) 

iff  /\v'  £  Tv{o)  where  TV  =  (J  A'’(a,) 

a^N  i=l..k 

The  first  step  is  just  an  expansion  of  D.  For  the  second  step,  take  each 
aj  in  turn  and  consider  two  cases.  If  aj  is  not  an  intersection  variable 
then  M{aj)  =  {ay}  and  so  the  second  step  is  trivial.  On  the  other  hand, 
suppose  that  ay  is  an  intersection  variable,  say  V;vy.  Corresponding  to 
Vsji  there  exists  a  constraint  Vfij  2  <*1  0  •••a{,  /  >  2,  that  is  introduced 
by  Transformation  4.  Moreover,  this  constraint  must  appear  in  C,--i  and 
Nj  =  A/’(ai)  U  •  •  •  U  Now,  since  v'  is  smaller  thsm  v  and  C,_i  is 

constructed  before  C,-,  the  main  induction  hypothesis  and  the  secondary  in¬ 
duction  hypothesis  respectively  imply  that 

v'  G  2i)(ai  n  •  •  •  n  a{)  iff  v'  G  TvO^Nj )»  and 

v‘  G  TviO'i  n  •  •  ■  n  a{)  iff  /\  v'  €  2i>(a) 

oeAT, 

and  the  second  step  follows  immediately.  The  final  step  in  the  chain  follows 
from  the  definition  of  N.  This  completes  the  inductive  proof  of  (7.18).  The 
following  key  property  is  an  immediate  corollary  of  (7.18):  if  t/  has  fewer 
symbols  than  t>,  and  V}v,ai,...,afc  appear  in  C,-  where  N  =  ^7(01)  U  •••  U 
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then 

v'  €  I'p(A’)  iff  €  XT){a.\  0  •  •  •  n  at)  (7.19) 

Now  consider  part  (a)  of  the  main  induction  hypothesis.  Assume  that 
V  6  T-D{a\  n  •  •  -nom).  It  follows  that  v  €  Oi,  *  =  l-wi.  Now,  if  one  of  the  a,-  is 
a  set  variable,  say  y,  then  (7.17)  implies  that  there  exists  a  constraint  y  2  « 
in  C  where  a  is  a  non-variable  atomic  set  expression  such  that  v  6  I-p{a). 
This  means  that  the  preconditions  of  Transformation  3  are  satisfied,  and  so 
the  constraint  A  D  oi  n  •  •  *  n  Oj-i  Hon  Oj+i  n  •  •  •  n  Om  must  appear  in  C. 

This  argument  may  be  repeated  if  necessary,  and  it  follows  that  C  must 
contain  a  constraint  of  the  form  X  D  oi  n  •  •  •  D  Om  where  each  a,  is  a 
non-variable  atomic  set  expression  such  that  v  G  a,-,  t  =  l..m.  Let  v  be 
/(»!,...,  i>„).  Then  each  a,-  must  be  of  the  form  /(a»,i,. . .  ,o,-,„)  such  that 
Vj  e  2t)(fl,j),  i  =  l..m,  j  =  l..n.  This  implies  that  vj  G  Ti>(ai j  n  •  •  •  n amj)- 
Since  C  contains  all  constraints  generated  by  Transformation  4,  it  follows 
that  C  contains  X  3  /(V/v, , . . . ,  Vn„)  such  that  Nj  =  •  •UA/’(amj)* 

By  (7.19),  Vj  G  ^©(VVy),  j  ~  l-n,  and  so  t;  G  2p(/(VAr,,...,Vjv„)).  Since 

2  /(VAri,...,V/v„)  appears  in  C,  and  hence  in  2>,  it  follows  that  v  G 
Id  (A"),  and  this  completes  the  proof  of  (a). 

To  prove  (b),  suppose  that  A*  3  oj  n  •  •  •  n  is  introduced  by  an  appli¬ 
cation  of  Transformation  4.  By  inspection  of  Transformation  4,  X  must  be 
an  intersection  variable.  The  first  part  of  the  proof  shall  establish 

if  {X  D  se)  6  C  then  v  G  Id('Sc)  implies  v  G Ip(oi  n-'-DOm)  (7.20) 

Now,  any  constraint  in  C  that  has  the  form  A'  3  se  must  be  the  result  of 
applications  of  transformations  to  A'  3  oi  n  •  •  •  n  Om  (since  initially  this  is 
the  only  constraint  in  C  involving  A').  Moreover,  the  only  transformations 
that  can  be  involved  in  this  process  are  Transformations  2  and  4.  FTom 
inspection  of  these  transformations  it  must  be  the  case  that  a  constraint  of 
the  form  A'  3  se  in  C  must  be  such  that  se  is  either: 

(a)  c'l  n  •  •  ■  n  such  that  either  oj  is  or  a  non-variable  atomic  set 
expression  a[  such  that  a,-  3  a\  appears  in  C,  or 

(/3)  /(Vat,  , . . . ,  Vat*)  such  that  there  is  a  constraint  X  3  /(ai,i, . . . ,  ci.n)  n 
•  •  •  n  . .  ■,am,n)  in  C  such  that  Nj  =  3  = 
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Now,  in  case  (o),  it  is  easy  to  see  that  3  because  either  Oj  is 

identical  to  or  else  a,-  3  appears  in  C  and  hence  in  D  (noting  that  a, 
must  be  a  variable  and  a'-  is  a  non* variable  atomic  set  expression).  Hence  it 
follows  that  Ip(ai  n  •  •  •  n  a^)  3  iD(se)  and  this  proves  (7.20). 

To  prove  (7.20)  in  case  (0),  assume  that  v  £  Xt>{se).  This  implies 
that  V  must  be  of  the  form  /(»i,-..,»n)  such  that  vj  £  j  = 

l..n.  By  (7.19),  Vj  £  Ip(aij),  i  =  l-.m,  j  =  l..n.  It  follows  that  v  £ 

=  l..m.  Hence 

V  €  Ii)(/(ai,i,...  ,ai,„)  n  •••n/(om,i,...,aTO,n))- 

Now,  the  expression  /(ai,i, . . .  ,oi,„)  n  •  •  •  n  /(a^,!, . .  .,am,n)  naust  satisfy 
(a),  for  which  (7.20)  has  already  been  proved.  Hence  v  £  Xv{ai  D  •  •  •  n  a^) 
and  this  proves  (7.20)  for  case  (0). 

Finally,  (b)  can  be  proved  as  follows.  If  v  €  Xp[X)  then  there  exists  a 
constraint  A*  3  a  in  I>  such  that  v  €  Xp{a).  Since  X  D  a  also  appears  in  C, 
it  follows  from  (7.20)  that  v  £  Xp{a\  n  •  •  •  n  Om).  [] 

Now,  combining  the  above  lemmas  with  Theorem  7  proves  that 

Theorem  8  (Correctness  of  projection-intersection  Algorithm) 
When  input  with  a  collection  Co  of  standard  form  constraints,  the  projection- 
intersection  algorithm  terminates  and  outputs  explicit  form  constraints  Co„t 
such  that  lm(Cout)  =var(Co)  ^”»(Co)* 

We  note  that  a  large  part  of  the  correctness  proof  for  the  intersection- 
projection  algorithm  involves  showing  that  the  algorithm  computes  exactly 
the  least  model  of  the  set  constraints.  This  arises  because  of  the  philosophy 
behind  set  based  approximation:  that  we  should  strive  for  simple  declarative 
notions  of  program  approximation,  independent  algorithmic  details.  Hence, 
algorithm  "correctness”  in  this  context  involves  showing  that  our  algorithm 
is  faithful  to  the  definition  of  set  based  program  approximation,  and  this  is 
achieved  by  showing  that  each  step  of  the  algorithm  is  equivalence  preserv¬ 
ing.  In  contrast,  if  we  are  only  interested  in  showing  that  the  algorithm  is  a 
conservative  approximation  of  the  program  (and  this  is  case  in  most  other 
works  on  program  analysis)  then  only  lemmas  12  and  13  are  needed. 

We  also  note  that  the  above  algorithm  is  incremental  in  the  sense  that 
the  execution  of  the  main  loop  could  be  halted  at  any  stage  and  extra  con- 
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straints  could  be  added.  Moreover,  the  solving  of  projections  and  intersec¬ 
tions  is  closely  intertwined.  An  alternative  formulation  of  the  algorithm  can 
be  obtained  by  separating  the  solving  of  projection  and  intersection  (such 
an  approach  is  taken  by  Heintze  and  Jaffar  in  [22]).  The  advantage  of  this 
is  that  the  proofs  relating  the  solving  of  intersections  become  simpler.  In 
essence  this  is  because  all  of  the  intersections  can  be  identified  and  performed 
together  and  so  complex  induction  hypotheses  and  intersection  variable  in¬ 
variants  are  not  needed.  The  disadvantage  is  efficiency.  In  particular,  by 
integrating  the  solving  of  projection  and  intersection,  the  simplification  of 
projections  becomes  significantly  cheaper. 


7.6  Quantified  Set  Expressions 


This  section  generalizes  the  algorithm  of  the  last  section  to  deal  with  the 
quantified  set  expressions  that  appear  in  SCp.  Specifically,  the  set  con¬ 
straints  considered  are  constructed  from  OP2,  where  OP2  consists  of  quan¬ 
tified  operators  and  complement  constants,  as  well  as  the  set  operators  in 
OPi.  The  constraints  considered  in  this  section  shall  be  constructed  from 
OP2  (note  that,  unlike  the  previous  section,  we  shall  include  T).  Three 
main  complications  arise  in  extending  the  algorithm  from  the  previous  sec¬ 
tion  to  deal  with  the  SCp.  First,  reasoning  about  quantified  set  expressions 
involves  reasoning  about  whether  a  prograun  term  is  defined  under  an  envi¬ 
ronment,  and  this  in  inherently  complex.  Second,  reasoning  about  apartness 
conditions  s  f  requires  the  introduction  of  complement  constants.  Third, 
although  lm{SCp)  turns  out  to  be  decidable,  lm{C)  for  a  collection  C  of  arbi¬ 
trary  set  constraints  constructed  from  OP2  is  not  in  general  decidable.  This 
means  that  termination  of  the  algorithm  requires  careful  reasoning  about 
the  details  of  the  constraints  that  arise  during  its  execution. 

Before  presenting  the  algorithm,  we  shall  first  outline  why  the  general 
problem  of  solving  set  constraints  constructed  from  OP2  is  not  decidable.  In 
essence,  this  is  because  the  combination  of  projection  and  function  symbols 
on  the  left  hand  sides  of  quantified  set  expressions  allows  unbounded  notions 
of  dependency  to  be  introduced  (see  the  discussion  in  Section  5.4,  page  123). 
More  concretely,  let  g,  f  and  c  be  function  symbols  of  arity  2,  1  and  0 
respectively  and  consider  the  constraints 
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X  D  g{c,c) 

■v  2  {A-:s(/,j;(},-,;(A')),/p;(jp;(x)))€^} 

In  essence  the  quantified  set  expr^sion  in  the  second  constraint  unpackages 
values  in  X  (which  are  all  of  the  form  5(ui,t>2)),  wraps  sm  “/”  around  both 
and  then  packages  them  up  again.  The  least  model  of  this  constraint  maps 
X  into  the  set  {5(/"(c),/"(c))  :  n  >  0}.  This  example  can  be  extended  to 
code  up  undecidable  problems  such  a  Post’s  correspondence  problem.  For 
example,  let  (<ri, •,(<yn>t'n)  be  a  collection  of  pairs  of  strings.  Now, 
treat  the  letters  that  make  up  these  strings  as  unary  function  symbols,  and 
write  the  constraints 

^  2  9(c,c) 

X  2  t  =  l..n 

where  the  notation  tr~^  denotes  the  sequence  of  projection  operators  such 
that  X  ^  <T~^(c(se))  =  se  for  all  interpretations  I  and  set  expressions  se. 
For  example,  if  <r  is  the  string  fgg  then  <t~*(sc)  denotes 
It  is  easy  to  see  that  the  least  model  of  these  constraints  maps  X  into  the 
set  of  all  values  of  the  form 

g{<^ii  f'h”'  Viu (c)) 

where  {i,-,  tj, . . . ,  C  {1, . . . ,  n}.  As  a  special  case,  we  can  also  define  the 
set  {^((7(c),a(c))  :  for  all  strings  cr}.  Combining  these  two  observations 
with  the  fact  OP2  includes  intersection  operators,  proves  that  any  Post’s 
correspondence  problem  can  be  coded  up  in  constraints  C  such  that,  for 
some  variable  y  €  uar(C),  lm(C)(y)  =  {}  iff  the  correspondence  problem 
has  no  solution. 

The  undecidability  of  the  class  of  set  constraints  constructed  from  OP2 
means  that  the  correctness  and  termination  of  the  algorithm  presented  in 
this  section  depends  crucially  on  the  form  of  the  quantified  set  expressions 
appearing  in  SCp.  By  inspection,  these  expressions  are  of  the  form  {X  : 
conj)  such  that  X  is  a  program  variable  and  each  quantified  condition  in 
conj  has  one  of  the  following  forms: 

(I)  s  £  se  where  s  is  constructed  from  projections  and  program  variables, 
(n)  s  f  se  where  s  is  constructed  from  projections  and  program  variables, 
(III)  t  £  a  where  t  is  an  arbitrary  program  term  and  a  is  a  ground  atomic 
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set  expression,  or 

(IV)  s  €  se  where  s  is  a  constructed  from  function  symbols  and  program 
variables, 


where,  in  each  case,  se  is  a  set  expression  containing  only  set  variables, 
projections  and  function  symbols.  The  first  kind  of  quantified  condition 
arises  from  program  conditions  of  the  form  s  =  t.  The  second  arises  from 
program  conditions  of  the  form  sj^t.  The  third  arises  from  the  construction 
of  defined(t)  in  the  constraints  corresponding  to  an  assignment  statement. 
Finally,  the  last  kind  of  quantified  condition  arises  during  the  translation  of 
environment  constraints  corresponding  to  logic  programs  rules. 

To  control  the  form  of  the  quantified  set  expressions  that  are  constructed 
during  the  algorithm,  it  is  convenient  to  mountain  quantified  set  expressions 
in  an  even  more  restrictive  form,  which  we  now  describe. 


Reduced  Form  Quantified  Set  Expressions 

A  conjunction  of  quantified  conditions  conj  is  in  reduced  form  if  no  condition 
appears  twice  in  conj  and  each  quantified  condition  in  conj  is  one  of  the 
following  forms: 


(a)  X  €  se, 

(b)  s  £  X  where  s  consists  of  program  variables  and  function  symbols,  or 

(c) 

where,  unless  otherwise  specified,  X  is  a  program  variable,  se  is  a  set  ex¬ 
pression  and  s  is  a  program  term.  In  other  words,  quantified  conditions  such 
as  G  Z  and  g{X,Y)  €  g(yV,Z)  are  excluded.  A  collection  of  con¬ 

straints  C  is  in  reduced  form  if  all  conjunctions  therein  are  in  reduced  form. 
The  algorithm  maintains  constraints  in  reduced  form  via  a  reduction  process 
that  converts  constraints  into  reduced  form  constraints.  In  essence,  this  in¬ 
volves  rewriting  quantified  conditions  such  as  G  Z  into  X  G  p(T,  2) 

andp(A',y)Gy(>V,2)intoXG)V  A  Y  €  Z. 
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(i)  Replace  €  se 

by  s  6  sc, X,...,X) 

(se  appears  at  the  argument  of  /). 

(ii)  Replace  f(si,...  ,Sn)  €  f(seu...,sen) 

by  Si  €  «ci  A  •  •  •  A  Sn  €  sCn. 


(iii)  Replace  /(•  •  •)  €  X  or  /(•  •  •)  €  g('  •  •)  such  that  f  ^  g 

by  true. 

(iv)  Repl^ /(si,...,Sn)  €  {sei,...,scm} 

^ /(sii*** *Sn)  €  SeT  A  •••  A  /(si,...,5„)  G  se^ 

(v)  Delete  X  D  {X  :  conj  A  /(•  •  •)  €  se}  if  se  is  either  ±, 

g("’)  such  that  f  ^  g,  Ox'S  such  that  /(X,.,.,X)  6  S 


(vi)  Replace  X  D  {X  :  conj  A  /(ei,..-»Sn)  €  /(sci,...  ,sen)} 

^  A*  2  {X  ;  conj  A  Si  €  SeT), X  D  {AT  :  conj  A  Sn  G  sc„} 

Figure  7.5:  The  Rewrite  Steps  of  reouce() 
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For  notational  convenience  we  shall  treat  a  conjunctions  conj  of  quan¬ 
tified  conditions  as  a  set  of  quantified  conditions.  That  is,  if  expi,. .  .expn 
are  quantified  conditions,  then  the  conjunction  expi  A  •  •  •  A  exp^  and  the  set 
{expi,. .  .,expn}  shall  be  used  interchangeably.  Now,  consider  the  rewrite 
steps  in  Figure  7.5  for  simplify  the  quantified  conditions  in  a  collection  C 
of  set  constraints.  In  these  steps,  se,sei,...,sen  are  set  expressions,  and 
s,  5i, . . . ,  s„  are  program  terms,  and  |5|  denotes  the  cardinality  of  the  set  S. 
Steps  (i-iv)  work  at  the  level  of  quantified  conditions,  and  replace  a  quan¬ 
tified  condition  by  a  (possibly  empty)  conjunction  of  quatntified  conditions. 
The  remaining  steps  work  at  the  level  of  constraints  -  steps  (v)  just  deletes 
a  constraint  and  step  (vi)  replaces  a  constraint  by  n  >  0  constraints.  Note 
that  in  (i),  the  argument  of  /(sci,...,sc„)  is  the  set  expression  se  that 
appears  in  €  se,  and  the  remaining  arguments  are  all  T.  For  exam¬ 

ple,  if  h  has  arity  3,  then  h^j(si)  €  /(c)  is  replaced  by  si  G  h(T,/(c),  T).  It 
is  assumed  that  if  a  quantified  condition  appears  more  than  once  in  a  con¬ 
junction  constructed  by  these  steps,  then  copies  of  the  condition  are  deleted 
until  only  one  remains. 

Where  C  is  a  collection  of  constraints,  define  that  reduce(C)  is  the 
result  of  exhaustively  applying  the  steps  in  Figure  7.5  to  C.  We  now  prove 
the  correctness  of  reduce.  We  begin  with  some  observations  about  the 
complement  constants.  As  mentioned  earlier,  all  complement  constants  used 
in  our  constraints  shall  be  of  the  form 

^  where  each  se  €  5  either  has  the  form  /(T, . . . ,  T)  r?  oi 

or  consists  only  of  function  symbols.  ' 

In  what  follows,  we  shall  implicitly  assume  that  this  property  holds,  and 
only  make  reference  to  it  when  new  complement  constants  are  generated. 
To  see  that  reduce  preserves  property  (7.21),  note  that  the  only  steps  of 
REDUCE  that  may  generate  new  complement  constants  are  (iv)  and  (vi). 
Now,  any  new  complement  constants  introduced  by  step  (iv)  are  of  the  form 
^  such  that  S  C  S*  for  some  complement  constant  ^  that  already  appears 
in  the  constraints.  Similarly,  any  new  complement  constants  introduced  by 
step  (vi)  are  of  the  form  3?  such  that  se  is  different  from  T  and  such  that 
a  complement  symbol  of  the  form  /(. . .  ,se, . . .)  is  already  present.  We  now 
prove  that  REDUCE  always  terminates. 

Proposition  26  REDUCE  terminates  on  any  collection  C  of  set  constraints 
(constructed  from  OV2)- 
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Proof:  Consider  step  (iv).  This  step  serves  to  replace  an  occurrence  of  a 
constant  {sej, . .  .,56^},  where  by  m  constants  Sef, . . .,se^.  Moreover,  no 
transformation  introduces  occurrences  of  constants  S  such  that  |5|  >  2.  It 
is  clear  that  there  can  only  be  a  finite  number  of  applications  of  step  (iv). 

Since  step  (iv)  cannot  be  applied  indefinitely,  it  follows  that  in  an  ex¬ 
haustive  application  of  steps  (i-vi)  to  C,  a  point  must  eventually  be  reached 
such  that  there  are  no  further  applications  of  step  (ii).  Termination  can  then 
be  proved  by  observing  that  steps  (i),  (ii),  (iii),  (v)  and  (vi)  all  reduce  the 
total  number  of  symbols  in  the  program  terms  appearing  in  the  quantified 
conditions  in  C.  To  see  this,  note  that  in  cases  (ii)  and  (vi)  a  program  term 
/(si, . . .  ,5„)  is  replace  by  s,-,  i  =  l..n,  in  cases  (iii),  and  (v)  a  program  term 
is  removed,  and  in  case  (i)  a  program  term  is  replace  by  s.  [] 

We  next  show  that  reduce  preserves  standard  form  and  also  preserves 
the  form  described  by  (I-IV).  This  involves  a  straightforward  verification 
for  each  step  of  REDUCE. 

Proposition  27  If  C  is  in  standard  form  then  so  is  reduce(C). 

Proof:  Let  C  be  a  collection  of  standard  form  constraints.  From  the  defini¬ 
tion  of  standard  form,  any  quantified  conditions  appearing  in  C  must  be  of 
the  form  s  €  a  or  s  f  a  such  that  a  is  atomic.  Now,  suppose  that  one  of  steps 
(i-vi)  of  REDUCE  is  applied  to  C.  Then,  the  only  new  constraints  introduced 
by  this  step  must  be  of  the  form  X  D  {A” :  conj}  such  that  each  quantified 
condition  in  conj  either 

1.  appears  somewhere  in  C; 

2.  has  the  form  s  £  /(^ei,--> ,een)  such  that  C  contains  a  quantified 
condition  of  the  form  t  £  se  and  each  sci  is  either  se  or  T  (see  step 

(i)); 

3.  has  the  form  s  £  sCi  such  that  C  contains  a  quantified  condition  of  the 
form  t  £  /(sci, . . .  ,scn)  (see  step  (ii)),  or 

4.  has  the  form  s  €  ^  (see  steps  (iv)  and  (vi)) 

In  each  case  it  is  clear  that  the  quantified  condition  must  have  the  form  s  £  a 
or  s  I  a  where  a  is  atomic.  It  follows  that  any  new  constraints  generated  by 
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the  step  must  be  in  standard  form.  Hence  each  step  of  reduce  preserves 
standard  form,  and  the  proposition  follows.  [] 

Proposition  28  Let  C  be  a  collection  of  constraints  such  that  each  quan¬ 
tified  condition  appearing  in  C  is  of  one  of  the  forms  (I-IV).  Then  each 
quantified  condition  in  reduce(C)  is  of  one  of  the  forms  (I-IV). 


Proof:  It  suffices  to  show  that  the  steps  (i-vi)  preserve  the  forms  (I-IV). 
Now,  only  steps  (i),  (ii),  (iv)  and  (vi)  can  introduce  new  quantified  condi¬ 
tions.  Consider  each  of  these  steps  in  turn.  In  step  (i),  the  new  quantified 
conditions  is  of  the  form  s  €  /(T,...,se, ...,T)  such  that  f(i^{s)  €  se  is 

either  of  form  (I)  or  (HI).  If  f^^is)  €  ee  is  of  form  (I),  then  s  contains  only 
projections  and  program  variables,  and  so  s  6  /(T, ...,  sc, ...,  T)  is  of  form 
(I).  If  /(7j*(s)  €  se  is  of  form  (III)  then  /(T,...,se,...,T)  is  ground,  and 
so  s  €  /(T, . . . ,  sc, . . . ,  T)  is  of  form  (III).  In  steps  (ii),  the  new  quantified 
conditions  are  of  the  form  s,-  €  sc,-  such  that  /(si,...,s„)  G  /(sci,. . .  ,scn) 
is  either  of  form  (III)  or  (IV),  since  forms  (I)  and  (II)  are  not  applicable. 
Hence  either  each  s,-  is  a  term  constructed  from  function  symbols  and  vari¬ 
ables,  or  else  each  sc,-  is  ground  and  atomic.  This  means  that  each  s,-  €  sc,- 
is  either  of  form  (III)  or  (IV).  In  steps  (iv)  and  (vi),  the  new  quantified 
conditions  are  of  the  form  s  G  5,  and  it  is  immediate  that  such  a  quantified 
conditions  are  of  form  (HI).  [] 

Combining  the  previous  two  propositions  with  some  simple  observations 
about  the  rewrite  steps  that  make  up  REDUCE  proves  that  REDUCE  achieves 
the  desired  rewriting  of  constraints  into  reduced  form. 


Proposition  29  Let  C  be  a  collection  of  standard  form  constraints  such 
that  each  quantified  condition  appearing  in  C  is  of  one  of  the  forms  (I-IV). 
Then  reduce(C)  is  in  reduced  form. 


Proof:  By  definition,  none  of  the  steps  of  reduce  can  be  applied  to 
reduce(C).  Hence,  if  s  G  sc  appears  in  reduce(C)  then  s  cannot  have 
the  form  f(i^is').  Furthermore,  if  s  has  the  form  f{si,..  .,Sn)  then  sc  can¬ 
not  be  of  the  form  /(•  •  •),  T,  S,  ±  or  g  ^  f-  Hence,  each  quantified 

condition  in  reduce(C)  must  be  of  one  of  the  following  forms: 
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(a)  X  e  se. 

(b)  /(•••)  €  se  where  se  is  either  a  set  variable,  or  of  the  form  sei  U  se2 
or  op(sei,...,se„). 

(c)  s  t  se. 


Now,  by  Proposition  27,  REDUCE(C)  is  in  standard  form  and  so  all  con¬ 
straints  in  R.educe(C)  are  of  the  form  A’  3  o  or  A"  D  op(ai,, .  .,0^)  where 
a,  oi , . . . ,  On  are  atomic.  Hence  the  quantified  conditions  in  reduce(C)  must 
be  of  the  form  s  €  a  or  s  f  a  where  s  is  a  program  term  and  a  is  an  atomic  set 
expression.  Also,  by  Proposition  28,  each  quantified  condition  in  reduce(C) 
must  satisfy  one  of  the  forms  (I-IV).  Hence,  in  case  (a)  above,  se  must  be 
atomic.  Now  consider  case  (b).  Since  se  is  atomic  it  must  be  the  case  that  se 
is  a  set  variable,  say  A'.  Hence  se  cannot  be  ground  and  so  /(•  •  •)  G  A'  must 
satisfy  either  (I)  or  (IV),  and  it  follows  that  /(•  •  •)  must  consist  of  function 
symbols.  In  case  (c),  no  farther  conditions  can  be  established.  In  summary, 
each  quantified  condition  in  REDUCB(C)  is  either  of  the  form  (a)  X  €  se,  (b) 
s  €  A*  where  s  consists  of  program  variables  and  function  symbols,  or  (c) 
s  t  se,  and  so  each  conjunction  of  quantified  conditions  in  reduce(C)  is  in 
reduced  form.  [] 

It  remains  to  prove  that  reduce  is  correct.  That  is,  we  seek  to  show 
that  lm(C)  =  /m(REDUCE(C)).  Unfortunately  this  is  not  always  the  case. 
For  example,  consider  the  constraint 

XD{X:  X)  €  (7.22) 

where  b  and  c  are  constants  and  g  is  i  binary  function  symbol.  Let  exp 
denote  €  g(,c,b)  and  let  J  be  an  interpretation.  By  definition, 

p  G  I{exp)  a  p  t>  and  pig{g^^}^{X),X))  ^  {^(c,^)}.  Now, 

the  first  condition  implies  that  p{X)  must  be  of  the  form  g{’")  and  this 
subsumes  the  second  condition.  Hence,  p  G  X{exp)  iff  p{X)  has  the  form 
^(vi,t>2)  for  some  values  vi  and  V2.  Thus,  the  least  model  of  the  constraint 
(7.22)  maps  A’  into  the  set  {</(i;j,V2)  :  and  ^2  are  values}. 

Now,  consider  the  application  of  reduce  to  the  constraint  (7.22).  This 
involves  replacing  the  constraint  by  Af  D  {AT  :  j|^j(A’)  G  c)  and  D  {X  : 

A"  G  6}  (using  step  (vi)),  and  then  replacing  5^j(A)  G  c  by  A  G  5(c,  T) 
(using  step  (i)).  Hence,  the  final  reduced  form  constraints  are: 
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X  D  {X:Xeg{c,T)} 

X  D 

The  least  model  of  these  constraints  maps  X  into  the  set  of  all  values  different 
from  b.  Clearly  the  least  model  of  (7.22)  has  not  been  preserved. 

The  problem  occurs  during  the  application  of  step  (vi),  when  (7.22)  is 
replaced  by  A'  D  :  g^^^(X)  €  c}  and  X  D  {X  :  X  £b}.  Before  the  step, 
(7.22)  is  equivalent  to 

X  2  {p(jr) :  (p  teic^))  ^  {c}  or  piX)  i  {6})  and  p  t>  g^^^iX)}  . 
However,  after  the  step,  the  resulting  constraints  are  equivalent  to 

AT  D  {piX)  :  p  (i7(-,;(A-))  ^  {c}  and  p  >  ^(-^(X)} 

U  {p{X)  :  p{X)  ^  {b}} . 

Hence,  the  least  model  is  not  preserved  by  reduce  because  the  condition 
p  t>  p^j(A')  is  not  present  in  the  quantified  set  expression  {X  :  X  6  fc).  In 
general,  there  are  two  steps  of  reduce  that  are  potentially  incorrect  in  this 
sense,  namely  steps  (iii)  and  (vi).  The  problem  in  both  cases  is  due  to  the 
requirement  that  environments  be  defined  on  each  program  term  appearing 
in  a  quantified  condition. 

The  proof  of  correctness  proceeds  by  showing  that  whenever  one  of  steps 
(iii)  or  (vi)  is  applied  to  a  quantified  condition  s  €  in  a  conjunction  conj, 
the  requirement  that  p  is  defined  on  s  is  in  fact  redundant  because  this 
requirement  essentially  appears  elsewhere  in  conj.  For  example,  consider 
the  set  constraint 

X  D  {X:X€p(T,...,T)Ap(p(-5(X),X)€n^}. 

An  application  of  step  (vi)  to  this  constraint  yields  the  two  constraints 

X  D  {X:X€p(T,...,T)aX€p(c,T)} 

X  D  {X:X€</(T,...,T)aX€F} 

This  step  is  correct  because  the  implicit  condition  that  X  must  have  the 
form  g{‘  •  •),  which  is  dropped  during  this  step,  is  in  fact  redundant  because 
it  appears  elsewhere  in  the  conjunction. 

More  generally,  recall  that  p  €  I(conj)  if 
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p  t>  s  A  p(s)  €  I(se)  for  all  5  €  se  in  conj 

and  p  t>  s  A  lv{v  ^  p(s)  A  v  e  T{se))  for  all  s  f  in  ^onj 

Now,  consider  an  alternative  definition  of  p  €  I(conji),  which  omits  quanti¬ 
fied  conditions  that  are  not  defined: 

p{s)  6  l{se)  for  all  5  e  se  in  conj  such  that  p  1>  s 

and  3v  (v  ^  p(s)  A  v  €  T(se))  for  all  s  j  i®  such  that  p  t>  s 

Under  certain  circumstances,  these  two  definitions  coincide,  and  this  turns 
out  to  be  the  key  property  for  correctness.  To  formalize  this,  first  extend 
the  t>  notation.  Define  that  p  ezp  holds  if  exp  is  s  6  se  or  s  f  se  and 
p  t>  s.  In  other  words,  p  C>  exp  denotes  the  condition  that  the  program  term 
appearing  in  exp  is  defined.  Now,  define  that  a  quantified  set  expression  is 
safe  with  respect  to  am  interpretation  as  follows. 

Definition  18  A  quantified  set  expression  {X  :  conj)  is  safe  with  respect  to 
an  interpretation  1  if,  for  each  p  ^  X(conj),  conj  contains  a  quantified 
condition  expp  such  that  p  >  expp  and  p  ^  X{expp).  A  collection  of  set 
constraints  is  safe  with  respect  to  X  if  all  of  the  quantified  set  expressions  it 
contains  are  safe  xoith  respect  to  X. 

In  other  words  {X  :  conj}  is  sade  with  respect  to  J  if  p  ^  X{conj)  implies  that 
there  is  some  quantified  condition  that  is  defined  under  p  but  not  satisfied 
by  p,  and  this  means  that  conj  contains  no  implicit  information  in  the 
definedness  of  program  terms.  Hence,  to  determine  the  relation  p  6  X{conj), 
all  undefined  quantified  conditions  can  be  sadely  ignored. 

We  now  proceed  with  the  proof  of  correctness  for  reduce.  In  essence 
we  shall  prove  that  each  step  of  REDUCE  is  correct  in  the  sense  that  if  it 
replaces  a  set  expression  scou  by  scnew  and  ^Cgid  is  safe  with  respect  to  an 
interpretation  X  then  J(sCneo/)  =  T{seoid)-  In  order  to  show  that  am  entire 
application  of  REDUCE  is  correct,  we  shall  also  need  to  argue  that  each  step 
preserves  safeness.  We  begin  by  proving  a  somewhat  abstract  property  about 
preservation  of  safety  that  will  be  used  repeatedly  in  the  propositions  that 
follow.  In  essence,  the  property  considers  the  replacement  of  a  quamtified  set 
expression  of  the  form  {X  :  conj  A  exp]  with  {X  :  conj  A  expi  A  •  •  •  A  exp,,}. 

Proposition  30  Let  conj  be  a  conjunctions  of  quantified  conditions  and  let 
exp, expi ,. .., expn  be  individual  quantified  conditions.  If  X  is  an  interpre- 
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tation  such  that: 

(a)  (V/£>  >  exp)  {j>  6 1{expi  A  •  •  •  A  exp^)  implies  p  e  J(eip)), 

(b)  p  X>  exp  implies  A.=i..nP 

(c)  {X  :  conj  A  exp}  is  safe  xoith  respect  to  1, 

then  {X  :  conj  A  exp^  A  •  •  •  A  exp^}  is  safe  with  respect  to  X. 

Proof:  To  show  that  {X  :  conj  A  expi  A  •  •  •  A  cay„}  is  safe  with  respect  to  I, 
suppose  that  p  ^  X(conj  A  expi  A  •  •  •  A  exp^)  and  we  need  to  show  that  there 
exists  a  quantified  condition  exp^  in  conjAexpif\-  •  -Aexpn  such  that  p  >  exp^, 
and  p  ^  X{exp^).  Now,  either  (1)  p  ^  I(conj)  or  (2)  p  ^  I{expi  A  •  •  •  A  exp^). 

In  case  (1),  p  X(conj  A  exp),  and  so  by  assumption  (c),  there  must  exist 
expp  in  conj  A  exp  such  that  p  >  exp^  and  p  ^  X{exp^.  Now,  consider  two 
subcases.  Either  exp^  appears  in  conj,  in  which  case  the  proof  is  complete, 
or  else  exp^  is  exp.  In  this  latter  case,  p  >exp  and  p  ^  X{exp),  and  combining 
this  with  (a)  proves  that  p  ^  X(expi  A  •  •  •  A  exp„).  It  follows  that,  for  some 
i,  p  ^  X(expi).  Moreover,  p  t>  exp  and  it  follows  from  (b)  that  p  >  ezpi. 
Hence  expi  is  the  required  quantified  condition. 

In  case  (2),  p  ^  X{expi  A  •  •  •  A  C2p„),  and  so  there  must  exist  some 
i  such  that  p  ^  J(ezp,).  Again  consider  two  subcases.  If  p  >  exp^  then 
the  proof  is  complete.  Otherwise,  if  it  is  not  the  case  that  p  >  expi,  then 
(b)  implies  that  p  >  J(ezp)  does  not  hold,  and  this  in  turn  implies  that 
p  ^  X{conj  A  exp).  Since  {X  :  conj  A  exp}  is  assumed  to  be  safe  with  respect 
to  X  and  p  ^  X{conjAexp),  there  is  some  exp^  in  conjAexp  such  that  p  >expf, 
and  p  ^  X{expp).  Now  exp^,  cannot  be  exp  because  of  the  assumption  that 
p  >  J(ezp)  does  not  hold.  Hence  exp^  must  appear  in  conj  and  hence  in 
conj  A  expi  A  •  •  •  A  cxp„.  f] 

The  following  two  propositions  form  the  core  part  of  the  proof  of  the 
correctness  of  reduce.  In  essence,  they  show  that  each  step  is  correct  and 
preserves  safeness. 

Proposition  31  If  one  of  steps  (i-iv)  of  reduce  is  applied  to  a  collection 
of  constraints  then  the  effect  of  the  step  is  to  replace  a  constraint  X  D  {X  : 
by  X  D  {X  :  conj„^}  such  that,  for  any  interpretation  I, 
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(a)  X{{X  :  conj^u})  C  1{{X  :  conj^^}),  and 

(b)  if  {X  :  conjow}  is  safe  vrith  respect  to  I  then 

(bl)  1{{X  :  conj^ii))  =  I{{X  :  conj„^»,  and 
(b2)  {X  :  conj^g^}  is  safe  xoith  respect  to  I. 

Proof;  If  step  (i)  is  used,  then  conjoin  is  conj  A  /(7)^(«)  €  se,  and  conj^^^ 
is  conj  As  £  /(T, . . . ,  T,se, T, . . . , T)  where  se  appears  at  the  i*^  argument 
of  f.  Now  let  J  be  an  arbitrary  interpretation  and  consider  the  following 
equivalences: 

P  €  €  se) 

iff  pt>f^^Jf(s)  A  p(f^•:f(s))€I(se) 

iff  pt>s  A  p(s)  is A  Vi£X(se) 

iff  ^t>s  A  p(s)  is  f(vi,...,Vn)  A  VieX(se)  A  £X{T) 

iff  p>s  A  /)(s)  €  l(/(sci,...,scn)) 

iff  /)  €  I(s  € /(sei, . . ,  jSCn)) 

These  equivalences  imply  that,  for  all  interpretations  X,  p  £  X(conjgi^)  iff 
p  £  X{conjng^).  Hence  :  conjow})  =  •  o^inew})?  “d  this 

proves  (a)  and  (bl).  It  remains  to  show  (b2).  Suppose  that  {X  :  conj^,^}  is 
safe  with  respect  to  X.  Then  clearly  the  following  properties  hold: 

•  €  se)  =  I(s  €  /(sci,...,se„)) 

•  p  t>  f^i^{s)  implies  p  >  s, 

•  {X  :  conj  A  f^^is)  £  se)  is  safe  with  respect  to  X 

and  hence  the  preconditions  of  Proposition  30  are  satisfied.  It  follows  that 
{X  :  conj^g^}  is  safe  with  respect  to  X. 

If  step  (ii)  is  used,  then  conj^^  is  conj  A  /(si, .  ..,««)€  /(sei, . . .  ,se„) 
and  conj^g^  is  conj  As\  £  se\  A  •  •  •  A  £  se„.  Now  let  J  be  2m  arbitrary 
interpretation  and  consider  the  following  equivalences: 
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p€l(f(Si,...,Sn)  €  f(s€i,...,sen)) 
iff  /)  . . .  ,5n)  A  p(/(5l}  •  • .  j^n))  €  I(/(5Cl> •  •  • » •SCn)) 

iff  /\p  >Si  A  /\  p(si)  €  I(sei) 

i=l..n  i=l..n 

iff  /\  (p  A  p(si)  €  I(sei)) 

t=l..n 

iff  /\  €  sci) 

t=l..n 

iff  /J  €  1(51  €  361  A  •••  A  3„  6  3e„) 

These  equivalences  imply  that,  for  all  interpretations  T,  p  £  J{conjoid)  iff 
p  €  Jiconj^^).  Hence  I({X  :  conj^u})  =  I{{X  :  con;„e„}),  and  this 
proves  (a)  and  (bl). 

To  prove  (b2),  assume  that  {X  :  conj^d)  safe  with  respect  to  I. 
Clearly  the  following  properties  hold: 

•  ^ ^  » •  •  •  1  ^ (^1  €  SCi  A  •  •  •  A  5,1  €  5Cn), 

•  p  t>  /(si, . . . , ^n)  implies  p  t>  Si  A  •  •  •  A  p  >  Sn 

•  {X  :  conjgid}  is  safe  with  respect  to  t 

amd  so  Proposition  30  proves  that  {Jf  :  conj^^^}  is  safe  with  respect  to  I. 

If  step  (iii)  is  used,  then  conj^id  is  conj  A  si  €  sci  where  si  is  of  the  form 
/(•  •  •)  and  5€i  is  either  T  or  of  the  form  p(-  •  •)  such  that  f  ^  g,  and  conj,^ 
is  conj.  Clearly 

p  €  I{conj  A  Si  €  sci)  implies  p  €  T{conj) 

and  it  follows  that  X{{X  :  conj^i^)  C  1{{X  :  otmj}).  This  proves  (a).  Now 
consider  (bl)  and  suppose  that  {X  :  conj^y}  is  safe  with  respect  to  I,  and 
consider  the  following  property: 

p  €  X{oonj)  implies  p  >  si  (7.23) 

To  prove  (7.23),  suppose  that  p  >  si  does  not  hold.  This  implies  that 
p  ^  X{conjoid).  Since  scou  is  safe  with  respect  to  I,  there  exists  a  quantified 
condition  exp^  in  conjf,id  such  that  p  >  exp^,  and  p  ^  X{exp^).  Since  exp^ 
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cannot  be  €  5ei,  it  must  be  the  case  that  exp^,  appears  in  conj.  Hence 
p  ^  I(conj).  This  completes  the  proof  of  (7.23). 

Now,  consider  the  condition  si  €  sei.  If  p  >  Si,  then  it  is  clear  that 
p(si)  €  T(sei)  because  if  sci  is  either  T  or  ^(•••)  then  I(sci)  contains  all 
values  of  the  form  /(•  •  •).  Hence  p  €  T{s\  €  scj)  iff  p  >  s\.  Combining  this 
with  (7.23)  proves  that 

p  €  X{conj)  implies  p  €  I(conj  A  Si  6  «ei) 

and  so  J({X  :  conj^g^})  C  X({X  :  conj^j}).  Combining  this  with  (a) 
proves  (bl). 

To  prove  (b2),  assume  that  {X  :  conjf,ij}  is  safe  with  respect  to  I.  Now, 
we  have  already  proved  that  p  6  X(si  €  sei)  iff  p  t>  si,  and  so  if  p  >  Si 
then  p  €  T(si  €  sei)  implies  p  €  I(true).  Hence  the  three  preconditions 
of  Proposition  30  hold  (note  that  the  second  is  vacuous  since  n  =  0)  and  it 
follows  that  {X  :  conj^g^}  is  safe  with  respect  to  J. 

If  step  (iv)  is  used,  then  conj^^  is  conj  A  /(si , . . . ,  Sn)  €  {sei  >  •  •  •  > 
and  conj^  is  conj  A  /(si,...,Sn)  €  A  •  •  •  A /(si,. . .  ,s„)  e  It  is 
easy  to  verify  that  if  p  >  /(si, . . .  ,Sn)  then: 

p(/(^i>  •  •  •  »®n))  €  T({sei, . . .  ,sCn»})  iff  (7  24) 

p(/(Si, . . .  ,Sn))  €  T(s€i)  a  •  •  •  a  p(/(Si,  . . .  *Sn))  €  T(sCto) 

and  this  implies  that  J{{X  :  conj^^})  =  X{{X  :  conj^g^}),  for  all  inter¬ 
pretations  J,  and  hence  proves  (a)  and  (bl).  To  prove  (b2),  assume  that 
{X  :  conjgi^}  is  safe  with  respect  to  X.  Now,  it  follows  from  (7.24)  that 

X  ^/(si, . . .  ,Sn)  €  {sci, •  • .  jSCto}^  — 

X  ^/(si  ,...,Sn)  €  SCi  A  •••A  ^(si , . .  • »  Sn )  G  SCm^  . 

Also,  it  is  clear  that 

P  >  (/(Si,...,Sn)  €  {s€i,...,sc,„})  iff 
P  ^  ^^(Sl , . . .  ,  Sn)  €  SCi  A  •  •  •  , Sn)  €  SCm^  . 

This  establishes  the  preconditions  of  Proposition  30.  Hence  {JT  :  conjJ^g^} 
is  safe  with  respect  to  X.  [] 


210 


CHAPTER  7.  SOLVING  SET  CONSTRAINTS 


Proposition  32  If  one  of  steps  (v)  or  (vi)  of  reduce  is  applied  to  a 
collection  of  constraints  then  the  effect  of  the  step  is  to  replace  a  con¬ 
straint  X  D  {X  :  conjgf^}  by  n  >  0  constraints  X  D  {X  :  conj^^},  . . 

X  D  {X  :  conj^}  such  that,  for  any  interpretation  T, 


(a)  I({X  :  conj„if)  C  I {{X  :  conj^}  U  •  •  •  U  {X  :  conj^}),  and 

(b)  if  {X  :  conj}  is  safe  with  respect  to  I  then 

(bl)  2  ({X  :  conjoid})  =  I  ({X  :  conJi)  U  •  •  •  U  {X  :  conj^}),  and 
(b2)  each  {X  :  conj,}  is  safe  with  respect  to  X. 


Proof:  If  step  (v)  is  used,  then  the  effect  of  the  step  is  to  delete  X  D  {X  : 
conj  old)  (i.e.  n  =  0).  The  proofs  for  (a)  and  (bl)  follow  from  the  fact  that 
there  do  not  exist  environments  p  and  interpretations  1  such  that  p  /(•  •  •) 
and  p{f{-"))  €  T{g{'-’)).  Hence  Z({X  :  conj^id))  is  the  empty  set,  for  all 
X.  Moreover,  condition  (b2)  is  vacuous. 

If  step  (vi)  is  used,  then  conj^d  /(^i> •••»«»»)  €  /(scx,.. .,5en)  and 
each  conJi  is  conj  A  €  Scf.  The  proof  essentially  follows  from  the  fol¬ 
lowing  chain  of  equivalences  in  which  p  is  an  environment  such  that  p  > 


/(Si,.., 

•  >^n)j 

P  €  I  •  •  •  » ^n)  €  /(sCl,  ...  jSCn)^ 

iff 

p(/(Si, . .  ,  ,Sn))  €  X  ^/(sCj,  . .  .,56^)^ 

iff 

P(.f(.^l^  •  •  •  )^n))  ^  X(/(sci,. . .  ,se„)) 

iff 

3*  ^  X(5c,)) 

iff 

3t  (p(s.)  €  1(327)) 

iff 

3i  p€X  (s,  €  Sei)  . 

To  prove  (a),  suppose  that  v  €  T({X  :  conjoid))’  This  implies  that  for  some 
p,  p(X)  =  V  and  p  €  X(oonjgid).  Hence  p  €  X{conj),  p  >  f{si, . . .  ,s„)  and 
p  €  T  (/(«!, ••.,5„)  €  /(scj,... ,se„)j.  The  chain  of  equivalences  (7.25) 

proves  that  there  is  an  t  such  that  p  €  X  €  Sff).  Also,  p  >  /(si, ...,«») 
implies  that  p  t>  s,-.  Hence  p  €  X{conj  A  s,-  €  s€i).  This  means  that,  for 
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some  i,  t>  €  T({X  :  coiy,}).  Hence  w  €  I  ({A" :  conjj}  U  •  •  •  U  {X  :  conj„}), 
and  this  proves  (a). 

Consider  (bl),  and  assume  that  {X  :  conjo/i}  is  safe  with  respect  to  I. 
It  is  first  necessary  to  show  that 

p  €  I{conj)  implies  p  >  Si  (7.26) 

where  t  ranges  from  1  to  n.  To  prove  (7.26),  fix  i  and  suppose  that  p  >  s,- 
does  not  hold.  This  means  that  p  is  not  defined  on  f(si, . . .  ,Sn),  and  so  p  ^ 
J(conjgi^).  Since  conj^f^  is  safe  with  respect  to  I,  there  exists  a  quantified 
condition  exp^  in  conjgi^  such  that  p  >  exp^  and  p  ^  I(exp^).  Since  exp^, 
cannot  be  /(si,...,Sn)  €  /(sex,..., sen),  i*  must  be  the  case  that  exp^ 
appears  in  conj.  Hence  p  ^  l(conj).  This  completes  the  proof  of  (7.26). 

To  complete  the  proof  of  (bl),  suppose  that  v  €  T{{X  :  con/,}),  for 
some  t,  t  =  l..n.  This  implies  that  there  is  an  environment  p  such  that 
p  €  I(conj)  and  p  €  J(s,-  €  sZ7).  FVom  the  implication  (7.26),  it  foUows 
that  p  t>  /(sx,...,Sn).  Combining  this  with  p  6  T(s,  6  s^)  and  the  chain 
of  equivalences  (7.25)  proves  that  p  €  T  (/(«!,•  ••,«;»)  €  /(sex,..., sen)). 
Hence  p  €  T(con/„j,j)  and  so  t;  €  :  con/<,y}). 

Finally,  consider  (b2)  and  assume  that  {X  :  conj^j}  is  safe  with  respect 
to  J.  Now,  the  chain  of  equivalences  (7.25)  can  be  used  to  verify  that,  for 
all  environments  such  that  p  >  /(sx,.>>,Sn), 

p€T(si  €se7)  implies  p  €  T(/(sx,...,Sn)  €  /(scx,. . . ,sen)) 

Moreover,  p  >  /(sx,. ..  ,Sn)  implies  p  >  s,-,  for  t  =  l..n.  Hence,  Proposition 
30  can  be  applied  to  prove  that  :  conj  A  s,-  G  se,-}  is  safe  with  respect  to 

I.  0 

The  following  lemma  combines  these  two  propositions  with  Propositions 
26  and  29  to  prove  the  correctness  of  reduce.  In  essence,  this  lemma  says 
that  REDUCE  terminates,  and  produces  reduced  form  constraints  whose  least 
model  is  the  same  as  the  least  model  of  the  input  constraints.  However,  it 
is  convenient  to  prove  the  lemma  in  a  somewhat  more  general  form  because 
REDUCE  is  used  in  a  variety  of  different  contexts  during  the  algorithm.  Note 
that  it  is  straightforward  to  combine  the  third  and  fourth  parts  of  the  lemma 
to  prove  that  REDUCE,  preserves  least  models,  and  this  proof  is  contained  in 
the  coroUary  immediately  following  the  lemma.  Also  note  that  the  last  part 
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of  the  lemma  describes  safeness  properties  of  the  result  of  applying  reduce, 
and  this  is  needed  because  reduce  will  in  general  be  applied  many  times 
during  the  algorithm. 


Lemma  14  (Correctness  of  Reduce)  LetC  be  a  collection  of  constraints 
and  let  J  be  an  interpretation.  Then: 

•  REDUCE  terminates  on  C; 

•  if  C  is  in  standard  form  and  each  quantified  condition  in  C  is  of  one 
of  the  forms  (I-IV)  then  reduce(C)  is  in  reduced  form; 

•  if  V  £  X(se)  for  some  constraint  y  D  se  in  C  then  there  is  a  constraint 
y  D  se'  in  reduce(C)  such  that  v  €  T(sef); 

•  if  C  is  safe  w.r.t.  I  and  v  €  X{se)  for  some  constraint  y  se  in 
reduce(C)  then  there  is  a  constraint  y  D  se'  in  C  such  that  v  €  X(se'), 
and 

•  if  C  is  safe  w.r.t.  X  then  REDUCE(C)  is  safe  w.r.t.  X; 

Proof;  The  two  parts  of  the  lemma  are  just  a  restatements  of  Propo¬ 
sitions  26  and  29.  The  remaining  three  parts  essentially  follow  from  re¬ 
peated  applications  of  Propositions  31  and  32.  To  summarize  these  two 
propositions,  let  C  be  a  collection  of  constraints  and  consider  a  single  ap¬ 
plication  of  one  of  the  steps  that  make  up  reduce.  Let  C  be  the  result 
of  the  application  of  this  step.  Note  that  each  step  of  reduce  can  be 
thought  of  as  replacing  one  constraint  by  a  (possibly  empty)  collection 
of  constraints.  Let  the  replaced  constraint  be  A*  D  se^  and  let  the  col¬ 
lection  of  constraints  be  Af  3  sei,...,A'  3  sc„,  n  >  0.  That  is,  C  is 
(C  -  {A’  3  sc;^})  U  {A'  3  sci, . . . ,  A'  3  sc„}.  Propositions  31  and  32  imply 
that: 

(a)  X{sex)  C  X  (sei  U  •  •  •  U  3Cn)»  and 

(b)  if  C  is  safe  with  respect  to  X  then 

(bl)  X(sex)  =  X  (sei  U  •  •  •  U  sen),  and 
(b2)  sei , . . . ,  sen  are  safe  with  respect  to  X. 
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Using  (a),  (bl)  and  (b2),  the  following  fact  can  be  established:  if  C'  is 
obtained  from  C  by  one  application  of  the  steps  that  compose  reduce  then 

(1)  if  V  €  T(se)  for  some  constraint  y  D  se  in  C  then  there  is  a  constraint 
y  2  se'  in  C'  such  that  t>  €  T(se*); 

(2)  if  C  is  safe  w.r.t.  I  and  v  €  I(se)  for  some  constraint  y  D  se  in  C' 
then  there  is  a  constraint  ^  D  se'  in  C  such  that  v  €  2’(se'),  and 

(3)  if  C  is  safe  w.r.t.  I  then  reduce(U)  is  safe  w.r.t.  I. 

To  prove  (1),  suppose  that  v  e  J(se)  for  some  constraint  y  D  seinC.  If  this 
constraint  is  in  fact  A’  D  then  i>  €  I  (sej  U  •  •  •  U  se„)  by  (a).  Hence,  for 
some  i,  V  €  sei.  Since  C'  contains  X  D  se,-,  i  =  l..n,  the  proof  is  complete. 
On  the  other  hand,  if  y  3  se  is  different  from  ^  3  se;*.,  then  y  3  se  appears 
in  C  and  so  the  proof  is  immediate. 

To  prove  (2),  suppose  that  C  is  safe  with  respect  to  Z  and  suppose  that 

V  €  Z(se)  for  some  constraint  3  se  in  C.  If  this  constraint  is  one  of  the 
constraints  /t'  3  se,-, »  =  l..n,  then  t?  6  Z (sej  U  •  •  •  U  se^)  and  so  by  (bl), 

V  €  I(sex)-  Since  C  contains  3  se;*,  the  proof  for  this  case  is  complete. 
On  the  other  hand,  if  3  se  is  different  from  the  X  3  sei,  then  3^  3  se 
appears  in  C  and  so  the  proof  is  immediate. 

To  prove  (3),  let  se  be  a  quantified  set  expression  in  C'.  If  se  is  one  of 
the  sei,  then  it  follows  from  (b2)  that  se  is  safe  with  respect  to  Z.  On  the 
other  hand,  if  se  is  not  one  of  the  sei,  then  se  appears  in  C,  and  hence  se  is 
safe  with  respect  to  Z  because  C  is  safe  with  respect  to  Z. 

The  proof  of  the  lemma  can  now  be  completed  by  applying  this  fact  to 
each  step  performed  during  reduce(C)  and  chaining  the  results  together. 

D 

Corollary  1  Let  C  be  a  collection  of  constraints.  If  C  is  safe  with  respect 
to  lm{C)  then  /m(REDUCE(C))  =  lm{C). 

Proof:  Let  Z  denote  /m(C)  and  consider  a  constraint  3  se  in  reduce(C). 
Let  V  be  any  value  in  Z(se).  By  part  (c)  of  Lemma  14  (noting  that  C  is  saie 
with  respect  to  Z  =  /m(C)),  there  is  a  constrmnt  X  3  se'  in  C  such  that 
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V  €  X{se*).  Since  I  is  a  model  of  C,  it  follows  that  v  €  Hence  I  is  a 

model  oi  X  D  se,  and  since  this  was  an  arbitrary  constraint  in  reduce(C), 
1 1=  reduce(C).  Thus  lm{C)  D  /m(REDUCE(C)). 

Conversely,  let  J  denote  /m(REDUCE(C))  and  consider  a  constraint  X  D 
se  in  C.  If  v  €  T{se)  then  by  part  (b)  of  Lemma  14  (noting  that  safeness 
is  not  required  here)  there  is  a  constraint  A'  D  se'  in  reduce(C)  such  that 

V  €  I(se').  Since  I  is  a  model  of  reduce(C),  it  follows  that  v  e  T{X). 
Hence  J  is  a  model  of  each  constraint  in  C,  and  so  lm{C)  C  /m(REDUCE(C)). 

D 

This  completes  the  proof  of  correctness  for  reduce.  Now,  the  set  con¬ 
straints  SCp  for  a  program  P  must  be  initially  put  into  reduced  form.  This 
can  he  achieved  by  first  applying  STANDARDIZE  (to  put  them  into  standard 
form)  and  then  applying  REDUCE  (to  put  them  into  reduced  form).  We  now 
prove  that  this  initialization  process  is  correct.  The  main  part  of  this  is  to 
prove  that  SCp  is  safe  with  respect  to  lm{SCp). 


Lemma  15  (Initialization)  Let  SCp  be  the  set  constraints  for  a  program 
P,  and  let  Co  be  reduce(standardize(5Cp)).  Then 

(a)  Co  is  in  reduced  form; 

(b)  lm(Co)  =var(SCp)  i'niSCp). 

(c)  Co  is  safe  with  respect  to  /m(Co). 

Proof:  First  consider  (a).  By  proposition  14,  we  only  need  to  show  thc-t 
each  quantified  condition  in  STANDARDlZE(^Cp)  is  of  one  of  the  forms  (I- 
rV).  Now,  it  has  already  been  argued  that  the  quantified  conditions  in  SCp 
are  of  one  of  the  forms  (I-IV).  When  standardize  to  SCp  the  only  step  that 
may  alter  quantified  set  expressions  is  the  step  that  replaces  a  set  expression 
by  a  new  set  variable  and  adds  a  constraint  between  the  new  set  variable 
and  the  replaced  expression.  It  follows  that  standardize  preserves  the 
form  (I-IV). 

To  prove  the  remaining  parts  of  the  lemma,  it  must  first  be  established 
that  SCp  is  safe  with  respect  to  lm{SCp).  Recall  the  construction  of  SCp 
from  section  6.2,  page  152,  where  set  constraints  are  described  for  each  of 
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the  five  different  kinds  of  environment  constraints.  In  most  cases  it  is  easy 
to  see  that  the  quantified  set  expressions  introduced  are  safe  with  respect 
to  lm{SCp)  because  they  contain  only  quantified  conditions  of  the  form 
X  €  <se  or  s  €  a  where  a  contains  only  function  symbols.  There  are  only 
three  non-tiivial  cases. 

The  first  involves  the  set  constraints 
D  {Xi:defined{t)  ^ 

introduced  during  the  translation  of  9^  D  where  defined{t)  de¬ 

notes  the  conjunction  of  all  conditions  of  the  form  s  €  /(T, . . . ,  T)  such  that 
/^7)^(s)  is  a  subterm  of  t.  Let  conj  denote  defined(t)  A  ^  To 

show  that  the  quantified  set  expressions  {Xi :  conj}  are  safe  with  respect  to 
Im(SCp),  consider  an  environment  p  such  that  p  ^  lm{SCp){conj).  Clearly 
there  must  exist  at  least  one  quantified  condition  s  G  se  in  conj  such  that 
p  ^  lni(SCp){f  G  se).  Pick  the  quantified  condition  such  that  the  number  of 
function  symbols  in  s  is  minimized.  Now,  if  it  is  not  the  case  that  p>s,  then 
s  must  contain  a  subterm  of  the  form  fu!  (s')  such  that  either  ^(s')  is  not  de¬ 
fined  or  else  p(s')  is  not  of  the  form  /(•  •  •)•  This  means  that  s'  €  /(T, . . . ,  T) 
must  appear  in  conj  and  that  p  ^ I(s'  6  /(T,.. . ,T)).  However  s'  contains 
fewer  function  symbols  than  s,  and  this  contradicts  the  choice  of  s.  Hence  it 
must  be  the  case  that  p  >  s,  and  this  completes  the  proof  that  each  quamti- 
fied  set  expression  {Xi :  d€fined(t)  A  Aj=i..m  €  'I'/}  is  safe  with  respect 
to  lm{SCp). 

The  second  non-trivial  case  involves  the  set  constraints 

Xi  D  {X,  :  dcyine(/(oonifc)  A  €  A/} 

introduced  during  the  translation  of  D  j  V  •  •  •  V  conj„].  The  proof 

for  the  quantified  conditions  in  these  constraints  is  identical  to  the  first  case. 

The  third  non-trivial  case  involves  the  set  constraints 

Xi  2  {Xi :  translateiconjk)  A  Ai=i..m-^i  ^  Xj} 

introduced  during  the  translation  of  ’S"*  3  '^^[conji  V  •  •  •  V  conj^].  Let 
J  denote  lm{SCp).  The  safeness  of  these  quantified  set  expressions  relies 
on  a  combination  of  two  factors:  first,  the  quantified  conditions  Xj  G  Xj, 
j  =  l..m,  and  second,  properties  of  the  sets  assigned  to  the  Xj  under  I. 
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These  two  factors  are  combined  in  the  property 

if  pel  (Aj=i..m  Xj  6  Xj)  then  p  >  conj^  (7.27) 

which  clearly  implies  that  {X,- :  translate^conj h)  A  €  Aj}  is  safe 

with  respect  to  I.  To  prove  (7.27),  consider  the  values  of  the  variables  Xj 
under  J.  Now,  the  constraints  for  these  variables  are 

Xj  D  {Xj  :  defined{conjk)  A  Ai=i..mXi  e  Xj^} 

and  moreover,  there  is  only  one  lower  bound  for  each  variable  Xj.  Hence, 
by  Proposition  18,  these  inequalities  are  in  fact  equalities  in  I.  That  is, 

I{Xj)  =  I{{Xj  :  defin&I(conji,)  A  Al=i..m  e  J  =  l..m  (7.28) 

Using  the  equality  (7.28),  the  expression  p  6  2(Aj=i..mXj  6  Xj)  can  be 
significantly  simplified.  Recall  that  Xi,...,Xm  is  a  list  of  the  program 
variables,  and  let  g  denote  the  set  environment  that  maps  the  program 
variables  Xj  into  2{Xj).  Now,  consider  the  following  chain  of  equivalences: 

iff  A 

iff  A  PiXj)  e  I{{Xj  :  definediconj,,)  A  Xj  ^  Xj^}) 

j=l..tn  j=X..*n 

iff  A  PiXj)e{piXj):peI{defined{conjk))  A  A  Pi^j)  eI{Xj')} 

j=l..m  j=l..m 

iff  A  P(Xj)  €  {piXj) :  p  e  I{defined{conjk))  A  p  e  g] 

j=l..m 

iff  p  e  A({p  e  g  :  P  e  I(defined{conjii))}) 

iff  pe  A  {{p  €  g  :  />(«)  €  T{se)  for  each  s  €  se  in  defined{conj  i^)}) 

iff  p  eA  ({p  €  g  :  p{s)  has  form  /(•  •  •)  for  each  term  in  condjt}^ 

iff  peA  {{p  €  0  :  p  >  condjfe}) 

Now,  by  Proposition  14,  {p  G  p  :  p  >  condk}  is  set  based.  Hence  the  con¬ 
dition  p  ^  A  ({p  €  p  :  p  >  condk})  is  equivalent  to  p  >  condk.  This  means 
that  pel  (Aj=i..m  Xj  e  Xj^  implies  that  p  t>  condk,  and  this  completes 
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the  proof  of  (7.27).  This  completes  the  proof  that  all  of  the  quantified  set 
expressions  appearing  in  SCp  are  safe  with  respect  to  lm{SCp). 

Consider  the  application  of  standardize  to  SCp.  Suppose  that  C*  is 
obtained  form  C  by  a  single  step  of  standardize  and  that  C  is  safe  with 
respect  to  lm(C).  It  has  already  been  proved  that  lm{C)(X)  =  /m(C')(A') 
for  each  X  €  var(C)  (see  the  proof  of  Proposition  21).  Now,  suppose  that 
this  step  of  does  not  introduce  any  new  quantified  set  expressions,  and  this 
implies  that  C  is  safe  with  respect  to  lm{C).  Since  the  safety  of  a  quantified 
set  expressions  se  with  respect  to  an  interpretation  only  depends  on  the  set 
variables  appearing  in  se,  it  follows  that  C  is  safe  with  respect  to  lm(C). 

Now  consider  the  case  where  the  step  of  standardize  does  introduce 
new  quantified  set  expressions.  As  has  already  been  noted,  the  only  step  that 
can  do  this  is  the  step  that  replaces  a  non-standard  occurrence  with  a  new 
variable.  Hence,  C  must  contain  a  quantified  set  expression  {X  :  conj  A  exp} 
such  that  exp  is  either  s  €  se  or  sfse  and  the  new  quantified  set  expression  in 
C'  is  {X  :  conj /K exp'}  where  exp'  is  s  €  Z  or  s}Z  and  .2  is  a  new  set  variable. 
Moreover,  Z  D  se  must  be  the  only  lower  bound  for  Z  in  C.  Proposition 
18  implies  that  lm(C'){Z)  =  lm{C'){se)  and  so  lm{C'){exp)  =  lm(C'){exp'). 
Since  {X  :  conj  A  exp}  contains  only  variables  from  C,  and  {X  :  conj  A  exp} 
is  safe  with  respect  to  lm(C),  it  foDows  that  {A”  :  conj  A  exp}  is  safe  with 
respect  to  lm(C').  In  summary, 

•  lm(C')(exp)  =  lm(C')(exj/); 

•  p  >  exp  iff  p  t>  exp'  (this  is  easy  to  verify),  and 

•  {X  :  conj  A  exp}  is  safe  with  respect  to  lm{C'). 

Hence  the  three  preconditions  of  Proposition  30  hold.  It  follows  that  {X  : 
conj  A  exp'}  is  safe  with  respect  to 

This  proves  that  a  single  step  of  standardize  preserves  safeness  with 
respect  to  the  least  model.  Repeatedly  applying  this  fact  proves  that  if  C 
is  safe  with  respect  to  lm{C)  then  standardize(C)  is  safe  with  respect  to 
lm(STANDARDIZE(C)). 

Now,  it  was  previously  shown  that  SCp  is  safe  with  respect  to  lm{SCp), 
and  so  standardize(5Cp)  is  safe  with  respect  to  lm(STANDARDiZE(5Cp)). 
Hence  Corollary  1  can  be  applied  to  prove  that 
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/m(STANDARDIZE(«SCp))  =  /m(REDUCE(STANDARDIZE(5Cp))). 

By  the  correctness  of  STANDARDIZE  (Proposition  21),  lm{SCp)  =„or(5Cp) 
/m(sTANDARDlZE(«SCp)).  Hence 

lm{SCp)  =var(scp)  /to(standardize(5Cp)) 

=  /m(REDUCE(STANDARDlZE(5Cp))) 

=  lm(Co). 

Furthermore,  since  the  constraints  STANDARDlZE(iSCp)  are  safe  with  respect 
to  /m(sTANDARDlZE(«SCp)),  Lemma  14  also  implies  that  Co  is  safe  with 
respect  to  lm(Co).  [] 


Transformations 

We  have  just  shown  how  set  constraints  SCp  can  be  converted  into  equivalent 
constraints  Co  that  are  in  standard  form  and  reduced  form.  We  now  present 
an  instance  of  the  generic  algorithm  for  obtaining  the  least  model  of  the 
constraints  Co.  The  instance  of  the  generic  algorithm  is  defined  by  the 
following  collection  of  transformations.  The  first  group  of  transformations 
deal  with  substitution. 


Transformation  5  (Qexp-Substitution)  If  C  contains  the  two  con¬ 
straints  X  D  {X  :  (s  €  y)  A  conj}  and  y  D  a  where  a  is  atomic,  then 
output  reduce(A'  2  {A" ;  (s  €  c)  a  conj}).  {] 

Transformation  6  (fl-Substitution)  If  C  contains  the  two  constraints 
X  2  Cl  n  •  •  •  n  a,_i  n  y  n  Cj+i  n  •  •  •  n  and  y  D  a  where  a  is  atomic 
and  n  >  2,  then  output  A’  2  “i  H  ••  •  n  Oi_i  n o fl  o,+i  n  •  •  •  n  On.  [] 

Transformation  7  (Var-Substitution)  IfC  contains  3^  2  o  ond  X  Dy 
and  where  a  is  atomic,  then  output  X  D  a.  [] 

We  remark  that  the  notion  of  substitution  described  by  these  transfor¬ 
mations  is  more  restrictive  than  the  substitution  used  in  the  intersection- 
projection  algorithm.  In  particular,  recall  that  Transformation  6  substituted 
for  any  set  variable  y  appearing  in  an  expression  op{ai, . . . ,  ,  y,  a, -4.1 ,  o„). 
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If  this  very  general  notion  of  substitution  was  carried  over  to  quantified  set 
expressions,  then  there  would  be  a  substitution  for  the  set  variable  y  in  the 
constraint  X  D  ^  t  3^}-  However  the  above  transformations  do  not 
admit  substitutions  into  quantified  conditions  of  the  form  s  f  ^  and,  as  we 
shall  see  later,  this  is  specifically  required  for  tennination. 

Note  that  in  the  first  transformation,  the  expression  {X  :  (s  €  a)  7^  con j} 
may  not  be  in  reduced  form.  Hence  REDUCE  must  be  applied.  For  example, 
consider  the  constraints 

X  D  {X:g(X,X)€y} 

y  2 

where  b  and  c  are  constants  and  g  is  binary  function  symbol.  When  the 
second  constraint  is  substituted  into  the  first,  the  resulting  constraint  X  D 

:  g{X,X)  €  </(6,c)|  is  not  in  reduced  form.  The  subsequent  application 
of  REDUCE  results  in  the  following  reduced  form  constraints. 

X  D 

X  2  {X:X€c} 

The  second  group  of  transformations  deal  with  simplifying  projections. 
We  note  that  there  are  two  possible  approaches  to  dealing  with  projections. 
We  could  just  extend  the  transformation  for  projections  from  the  projection- 
intersection  algorithm  (see  Transformation  3,  page  182)  to  deal  with  the 
additional  cases  involving  T  and  constants  of  the  form  S.  However,  since 
projections  are  essentially  special  cases  of  quantified  set  expressions,  another 
approach  is  to  convert  them  to  quantified  set  expressions.  We  choose  the 
latter  approach  for  presentational  simplicity,  since  it  avoids  some  duplication 
of  work.  (However,  note  that  their  are  implementation  reasons  for  distin¬ 
guishing  between  arbitrary  quantified  set  expressions  and  the  special  case  of 
projections.)  For  projections,  we  therefore  have  the  single  transformation: 


Transformation  8  (Projection)  If  C  contains  X  D  f^^(a)  then  output 
reduce(A'  2  :  /(Xiy...,X„)  €  a))  where  the  arity  of  f  is  n  and 

X\,...,Xn  are  distinct  program  variables.  [] 

It  is  assumed  that  the  Xi,. . . ,X„  are  chosen  in  some  canonical  manner  (for 
example,  using  some  fixed  listing  of  var)  so  that  this  transformation  cannot 
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be  repeatedly  applied  to  produce  reduce(A'  3  {X,-  :  f{Xi,...,Xn)  €  a}), 
and  then  reduce(^  D  {X/  ;  /(Xi,...,X^)  €  a})  etc. 

The  next  group  of  transformation  deal  with  intersection.  These  trans¬ 
formations  generalize  the  intersection  transformation  used  in  the  projection- 
intersection  algorithm  (see  Transformation  4,  page  182)  to  deal  with  the  T 
and  S. 

Transformation  9  (Intersection-1)  IfC  atntainsX  3  ain--"nam,  m  > 
2,  then  let  Sn  and  Cj , . . . , be  subsequences  of  such 

that  the  first  subsequence  contains  the  complement  constants  in  ai,...,am, 
and  the  second  contains  the  remaining  atomic  set  expressions,  and  output 
the  constraint  X  3  n  •  •  •  n  n  ^  where  5  =  5i  U  •  •  •  U  5„.  [] 

Transformation  10  (Intersection-2)  IfC  contains  X  3  ain-- -00^1^5 
such  that  m>  2,  and  each  a,-  is  of  the  form  /(oj.i,. ..  then  let  Nj  = 

U,_j..^A/’(aj,j),  j  =  l..n,  and  output  the  constraints 

•  X  3 /(VAri,..-,V/v„)n5,  and 

•  Vnj  2  <*1  j  n  •  •  •  n  Omj  for  each  i  var{C).  [] 

Transformation  11  (Intersection-S)  If  X  D  /(ci,...,c„)  n  5  appears 
in  C  and  /(T, . . . ,  T)  ^  5  then 

(a)  if  /'(•  •  •)  €  5  implies  f  ^  f  then  output  X  3  /(ci, . . .  ,c„)/ 

(b)  otherwise,  pick  an  element  of  the  form  /(sci,...,se„)  from  S,  let  S' 
be  S-  {f{sei 3Cn)},  let  Nj  be  M{aj)  U  {Sej},  j  =  l..n,  and  output 
the  constraints: 

•  X  3  f{^a\, , . ,  ,aj^\,Vfif^,aj^\y. . .  jOn')  r\S',  j  =  l..n;  and 

•  VaTj  3  Oj  n  sei  for  each  Vwy  ^  var(C).  [] 

The  first  transformation  serves  to  collect  constants  of  the  form  S  together 
so  that  constraints  of  the  form  X  3  oj  O  •  •  -  n  ot  O^n  •  •  •  n  5n  are  converted 
into  the  form  X  3  oi  n  •  •  •  n  a/t  n  y.  Note  that  in  the  boundary  case  where 
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n  =  0,  5  is  the  empty  set  and  the  constraint  X  D  ci  fl  •  •  •  n  a*  n  T  is 
constructed.  The  second  transformation  then  combines  expressions  of  the 
form  /(••■)  n  •••  n  /(•  •  •)  into  /(•  •  •).  The  final  transformation  deals  with 
the  interaction  of  expressions  of  the  form  /(■  ■  *)  and  5.  Note  that  these 
transformations  also  simplify  constraints  involving  T  because  T  is  identified 
with  {  }.  Hence  Transformation  11  simplifies  A'  3  a  n  T  into  X  D  a. 

To  illustrate  the  behavior  of  these  transformations,  consider  the  con¬ 
straints 

X  D  g(b, b)  n  g(y, y)  n  {g(b,c)}  n  {gic, b)} 

y  2  b. 

In  particular,  note  that  in  the  least  model  of  these  constraints,  g(b,b)  is 
an  element  of  X  (in  fact  it  is  the  only  element  of  X).  We  now  show  how 
the  intersection  transformations  add  to  these  constraints  to  make  this  fact 
explicit.  First  Transformation  9  is  applied  to  obtain 

X  D  g(b,b)  n  g(y,y)  n  {^(6,c),^{c,6)} 

Transformation  10  can  now  be  applied  to  this  constraint  to  yield: 

X  2  p(V{6,y},  n  {ir(6,  c),g{c,  6)} 

V^,,yy  D  bny 

A  subsequent  application  of  Transformation  10  to  O  b  ny  yields 

V{6,y}  2  b.  Also,  Transformation  11  can  be  applied  to  A'  3  V{t,3;})n 

to  obtain: 

X  2  y  V{6,y,c})  b) 

5  y{hy}^b 
V{fc,y,c}  3  V{6,y}ne 

Applying  Transformation  11  to  the  first  two  of  these  constraints  yields: 
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X  2  5(V{6,y,c}.V{63;,c}) 
X  2  ^(V{6,y}>V{6^,6:,c}) 
^{6,y,F,c}  -  ^{6,y,6}  ^ 


Finally,  consider  applying  transformations  to  the  constraints  for 
V{6,y,c}  and  This  yields  the  single  constraint 

To  summarize,  the  collection  now  contains  the  following  explicit  form  con¬ 
straints: 

A*  2  i/(V{,.y,j,g,V^6.y)) 

X  2 
X  2 
X  2 
y  D  b 
V{6y}  2  b 
V{6,y,c}  2  b 

Hence,  g{b,  6)  €  A'  is  now  explicit  (that  is,  g(b,  b)  is  now  an  element  of  A'  in 
the  least  model  of  the  explicit(C)). 

The  final  group  of  transformations  deals  with  quantified  set  expressions. 
The  first  two  transformations  serve  to  remove  apartness  conditions.  One 
deals  with  s  f  a  in  the  case  where  a  is  a  singleton  set  under  the  interpreta¬ 
tion  lm{explici1(C)),  and  the  other  deals  with  the  case  where  a  contains  more 
than  one  element.  The  last  transformation  replaces  a  quantified  set  expres¬ 
sion  with  an  intersection  of  atomic  set  expressions,  and  for  this  transforma¬ 
tion  some  preliminary  definitions  are  needed.  Define  that  a  conjunction  of 
quantified  conditions  conj  is  in  variable-expression  form  if  each  quantified 
condition  in  conj  is  of  the  form  X  €  a  where  X  is  a  program  variable  and 
a  is  an  atomic  set  expressions.  For  such  a  conjunction,  let  A*  be  a  program 
variable  appearing  in  conj  and  define  that  conj  is  oi  O  •  •  •  n  On  where 
A  €  oi,... ,  A  €  On  lists  all  of  the  quantified  conditions  in  conj  that  have 
the  form  A  €  a. 
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Transformation  12  (Qexp-f-l)  IfC  contains  X  D  {X  :  s] a^conj}  and 
lm{explicii(jC))(a)  =  {»}  then  output  reduce(A'  3  {X  :  s  G  t;  A  conj}).  [] 

TVansformaiion  13  (Qexp-t-2)  IfC  contains  X  D  {X  :  sfaA  conj}  and 
lm{explicit(C)){a)  contains  more  than  one  element  then  output  the  constraint 
X  D  {X  :  conj}.  [] 

Transformation  14  (Qexp-Compaction)  IfC  contains  X  3  {X  :  conj} 
such  that  conj  is  in  compaction  form  and  lm(explicit{C))((^conj)  is  non¬ 
empty  for  each  program  variable  Y  appearing  in  conj,  then  output  the  con¬ 
straint  /V  3  conj.  [] 

To  illustrate  this  group  of  transformations,  consider  a  collection  of  con¬ 
straints  C  that  contains  the  following  constraints: 

X  3  {x:f(x,x)ey  a  x}Z} 

y  2  me) 
y  2  fid,d) 
y  2  me) 

Suppose  that  there  are  additional  constraints  for  3^  and  that,  in  the  least 
model  of  explicit{C),  y  is  {e}.  We  now  show  how  transformations  can  be 
applied  so  that  d  ^  X  m  the  least  model  becomes  explicit.  First,  Transfor¬ 
mation  12  can  be  applied  to  produce 

X  2  {X:  fiX,X)  €  y  A  ^  6  e}  . 

Now,  the  constraints  for  y  can  all  be  used  to  substitute  for  the  occurrence 
of  3^  in  A'  3  {JV  :  f(X,X)  ey  A  X  ee),  and  the  result  is  the  following 
constraints: 

X  2  {X:  Xeb  A  X€c  A  Xee} 

X  2  {X:  X€d  A  Xew) 

X  2  {X:  X€e  A  X  €e} 

The  compaction  transformation  can  be  applied  to  these  three  constraints  to 
obtain: 

X  2  hncrie 

X  2  dne 

X  D  ens 
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On  applying  the  intersection  transformation  (Transformation  10)  to  this 
last  constraint,  the  constraint  X  D  dis  obtained,  and  thus  d  €  X  becomes 
explicit. 

Finally,  we  can  define  the  algorithm  for  solving  quantified  set  expres¬ 
sions.  Let  A2  denote  Transformations  5-14,  and  define  that  the  quantified 
expression  algorithm  inputs  set  constraints  C,  converts  C  into  constraints  Cq 
by  applying  standardize  amd  then  reduce,  and  then  exhaustively  applies 
the  transformations  A2  to  Cq  as  outlined  by  the  generic  algorithm. 


Correctness 

The  proof  of  correctness  of  the  quantified  set  expression  algorithm  is  fairly 
similar  in  structure  to  that  for  the  intersection-projection  algorithm  de¬ 
scribed  in  the  previous  section.  The  main  differences  aire  that  atomic  set 
expressions  now  include  T  and  S,  and  the  presence  of  quantified  set  expres¬ 
sions.  We  shall  often  omit  proof  details  for  cases  that  are  essentially  the 
same  as  those  in  the  previous  section,  and  instead  focus  on  the  new  cases. 
We  begin  by  considering  a  generalization  of  the  atomic  set  expression  in¬ 
variant  employed  in  the  intersection-projection  algorithm  (see  Proposition 
22,  page  185). 


Invariant  3  (Atomic  Set  Expression  Invariant)  A  collection  C  of  con¬ 
straints  satisfies  the  atomic  set  expression  invariant  if  each  atomic  set  ex¬ 
pressions  in  atomic{C)  either 

(i)  appears  in  atomic{Co); 

(ii)  is  introduced  by  an  application  of  Transformation  12; 

(Hi)  is  of  the  form  /(oj, . . . ,  Cn)  where  f  is  a  function  symbol  appearing  in 
Cq,  and  each  a,-  is  either  an  intersection  variable  or  a  strict  subterm 
of  some  atomic  set  expression  that  falls  into  cases  (i)  or  (ii);  or 

(iv)  is  of  the  form  S  such  that  S  C  atomic{Si  U  •  •  •  U  5n)  for  some  comple¬ 
ment  constants  Si,...  ,5*  satisfying  cases  (i)  or  (ii)  of  the  invariant. 

D 
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This  main  changes  in  the  (modified)  atomic  expression  invariant  are  due  to 
Transformations  9, 11  and  12,  which  may  introduce  new  atomic  set  expres¬ 
sions  of  the  form  7.  It  is  convenient  to  prove  this  invariant  in  tandem  with 
two  additional  properties:  during  the  algorithm  all  constraints  are  both  in 
standard  form  and  reduced  form. 


Proposition  33  (Invariants)  Each  Ci  constructed  by  the  algorithm  is 
in  reduced  form  and  standard  form,  and  satisfies  the  atomic  set  expression 
invariant. 


Proof:  By  the  Initialization  Lemma  (Lemma  15),  the  initial  constraints 
Cq  =  reduce(standar.dize(5Cp))  are  in  reduced  form  and  standard  form. 
It  is  easy  to  verify,  that  each  transformation  preserves  reduced  form  and 
standard  form,  noting  that  whenever  a  transformation  may  construct  con¬ 
straints  that  are  not  in  reduced  form,  the  procedure  reduce  is  immediately 
applied  to  return  the  constraints  to  reduced  form.  This  completes  the  first 
part  of  the  proof. 

Now  consider  the  atomic  expression  invariant.  Clearly  Co  satisfies  the 
atomic  set  expression  invariant,  and  it  therefore  remains  to  prove  that  each 
transformation  preserves  this  invariant.  To  this  end,  let  C  be  a  collection  of 
constraints  that  satisfies  the  atomic  set  expression  invariant,  and  consider 
the  sets  j(C)  for  each  transformation  S  in  Aj.  First  suppose  that  6  is  one 
of  TY'ansformations  6,  7,  13  and  14.  In  all  of  these  cases,  it  is  clear  that 
atomic(S(C))  C  atomic{C),  and  so  the  proof  is  trivial. 

Before  considering  the  remaining  transformations,  it  is  useful  to  outline 
some  properties  of  the  atomic  set  expression  invariant.  Suppose  that  C 
satisfies  the  atomic  set  expression  invariant.  Now,  note  that  condition  (iii) 
can  only  be  satisfied  by  an  atomic  set  expression  of  the  form  /(■■■)  and 
that  condition  (iv)  of  this  invariant  can  only  be  satisfied  by  a  complement 
constamt.  Hence,  if  /(-  -  •)  is  an  atomic  set  expression  appearing  in  C  then 
it  must  satisfy  one  of  conditions  (i-iii)  of  the  invariant.  Similarly,  if  ^ 
appears  in  C  then  it  must  satisfy  one  of  conditions  (i),  (ii)  and  (iv),  and  this 
implies  that  there  are  constants  . . .  ,?jb  that  satisfy  parts  (i)  or  (ii)  of  the 
invariant  such  that  S  C  atom»c(5i  U  •  *  ■  U  5n)- 

Tyansformation  5:.  Let  scm  denote  {A* :  (s  €  ^)  A  conj}  and  let  senem 
denote  {X  :  (s  €  a)  A  conj).  Note  that  a  is  a  non-variable  atomic  set 
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expression.  Since  X  D  sCoid  aJid  y  D  a  both  satisfy  the  atomic  set  expression 
invariant,  it  follows  that  X  2  also  satisfies  this  invariant.  If  s  is  a 

program  variable,  then  X  D  sCnew  is  in  reduced  form,  and  reduce  has  no 
effect.  Hence  in  this  case  S(C)  trivially  satisfies  the  atomic  set  expression 
invariant. 

In  the  remaining  case,  s  is  not  a  program  variable,  and  reduce  is  needed 
to  return  the  constrmnt  X  3  scnew  to  reduced  form.  Recall  that  reduce  is 
defined  to  be  the  exhaustive  application  of  the  six  steps  show  in  Figure  7.5, 
page  199.  Clearly  these  steps  do  not  affect  the  quantified  conditions  in  conj. 
Hence,  the  application  of  reduce  to  A'  3  scnew  can  only  involve  applications 
of  steps  to  the  quantified  condition  s  €  a  and  any  new  quantified  conditions 
produced  by  such  steps.  It  is  easy  to  verify  that  all  new  quantified  conditions 
produced  must  have  the  form  s'  €  a!  where  s'  is  a  subterm  of  s  and  a'  is 
either  a  subterm  of  a  or  else  of  the  form  S  where  there  exists  a  constant  S'  in 
a  such  that  S  C  atomic{S').  (Note  that  step  (vi)  cannot  be  applied  during 
reduce(A'  3  senew)t  and  this  is  the  only  step  that  builds  up  completely 
new  set  expressions.)  It  follows  that  each  element  of  atomic(S(C))  either  (a) 
appears  in  atomic(C)  or  (b)  has  the  form  J  such  that  atomic(C)  contains  a 
constant  J'  and  S  C  atomic(S').  By  combining  this  with  the  assumption 
that  C  satisfies  the  atomic  set  expression  invariant,  it  is  easy  to  verify  that 
REDUCe(A'  3  scnew)  satisfies  the  atomic  set  expression  invariant. 

Transformation  8:  Since  C  satisfies  the  atomic  set  expression  invariant 
it  is  clear  that  X  3  {X,-  :  €  a}  also  satisfies  the  atomic 

set  expression  invariant.  The  argument  that  S{C)  =  REDUCE(A'  3  {AT,-  : 
f{Xi,...,Xn)  €  a})  also  satisfies  this  invariant  is  identical  that  for  Trans¬ 
formation  5. 

Transformation  9:  This  transformation  may  introduce  new  atomic  set 
expressions  of  the  form  ?  such  that  ^,...,5n  appear  in  atomic{C)  and 
5  =  5i  U  ’  •  •  U  5n.  It  is  immediate  that  7  satisfies  part  (iv)  of  the  invariant. 

Transformation  10:  This  transformation  may  introduce  new  atomic  set 
expressions  of  the  form  /(Vv,  >  •  •  >  Viv»)  SRch  that  each  Vpf-  is  an  intersection 
variable  and  /  is  a  function  symbol  and  appearing  in  C  (and  hence  in  Co,  since 
no  transformation  can  introduce  new  function  symbols).  Such  expressions 
satisfy  part  (iii)  of  the  atomic  set  expression  invariant. 
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Transformation  11:  The  new  atomic  set  expressions  introduced  by  this 
transformation  are  either  of  the  form  (a)  /(ai,...,an)  such  that  each  a,- 
is  an  intersection  variable  or  a  strict  subterm  of  an  atomic  set  expression 
appearing  in  atomic{C),  or  (b)  sei  such  that  atomic{C)  contains  an  expres¬ 
sion  of  the  form  i?  where  /(sei,..., sc, sc„)  €  5.  First  consider  case 
(a).  Since  C  satisfies  the  atomic  expression  invaniant,  it  follows  that  if  a,-  is 
a  strict  subterm  of  an  atomic  set  expression  appearing  in  atomic(C),  then 
a,-  must  either  appear  in  atomic(Co),  be  introduced  by  Transformation  12, 
or  else  be  an  intersection  variable.  Hence,  it  is  clear  that  in  case  (a),  the 
new  atomic  set  expression  satisfies  part  (iii)  of  the  atomic  set  expression 
invariant.  Now  consider  case  (b).  In  this  case,  it  is  easy  to  verify  that  the 
expression  Jei  satisfies  the  atomic  set  expression  invairiant  because  S  must 
satisfy  part  (iv)  of  the  invariant. 

Transformation  12:  By  definition,  any  new  atomic  set  expressions  in¬ 
troduced  by  this  transformation  satisfy  part  (ii)  of  the  atomic  expression 
invariant.  [] 

We  next  establish  some  basic  properties  of  the  transformations.  Intu¬ 
itively,  each  transformation  picks  a  constraint  X  D  se  from  the  current  col¬ 
lection  C  of  constraints  and  endeavors  to  make  the  information  contained  in 
the  constraint  explicit  by  adding  new  constraints  X  D  sei, X  D  that 
are  “closer”  to  explicit  form  (strictly  speaking,  the  transformations  dealing 
with  intersection  output  some  additional  constraints  for  new  intersection 
variables).  Now,  the  following  sequence  of  propositions  consider  each  trans¬ 
formation  in  turn  and  essentially  relate  the  expression  se  in  the  constraint 
X  D  se  picked  from  C  with  the  expressions  5ei,...,sen  in  the  constraints 
constructed  by  the  transformation.  These  relationships  shall  be  used  to 
prove  that  the  transformations  are  sound  (in  the  sense  that  they  preserve 
/m(C))  and  complete  (in  the  sense  that,  on  termination  of  the  exhaustive 
application  of  the  transformations,  all  information  about  the  least  model  is 
contained  in  the  explicit  form  constraints).  Note  that  each  proposition  cor¬ 
responds  to  a  transformations  in  A2;  each  proposition  employs  the  notation 
used  in  the  specification  of  the  corresponding  transformation. 

Proposition  34  (Transformation  5)  Let  the  application  of  reduce  to 
2  {-X’  :  (s  €  a)  A  conj)  resvit  in  the  constraints  X  3  sci,. .  .,XD  sc„, 
and  let  1  he  an  interpretation. 
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(a)  If  I  ^  y  3  a  and  {X  :{s  ^y)  /K  conj}  is  safe  with  respect  to  I 
then  I({X  :(s€y)A  conj})  D  I(sei),  i  =  l..n. 

(b)  I ({X  :  5  €  a  A  conj})  C  I{se\  U  •  •  •  U  sen). 

Proof:  Consider  part  (a)  and  suppose  that  J  is  a  model  of  D  a  and 
{X  :  (s  €  y)  A  conj}  is  sade  with  respect  to  I.  This  implies  that  J(y)  2  X{a) 
and  so: 

1(5  €  a)  C  I(s€y)  (7.29) 

Moreover,  p  >  (sC^)  iff  p  >  (se^).  Combining  this  with  the  assumption 
that  {X  s^y  A  conj}  is  safe  with  respect  to  X  establishes  the  three  pre¬ 
conditions  of  Proposition  30.  Hence  {X  :  s€a  A  conj'}  is  safe  with  respect 
to  X.  It  follows  from  Lemma  14  that 

X{{X  :  5€o  A  conj})  D  I(5C,),  i  =  l..n. 

Now,  (7.29)  implies  that  X{{X  :  (sCD^)  A  conj’})  D  :  (s€o)  A  conj’}). 
Hence,  X{{X  :{s^y)A  conj’})  3  I(5Ci),  i  =  l..n,  and  this  proves  (a). 

Finally,  part  (b)  follows  immediately  from  Lemma  14,  and  this  completes 
the  proof  of  the  proposition,  [j 

Proposition  35  (Transformation  6)  IfX  ^  D  a  then 

X{a\  n  •  •  •  n  1  n  y  n  ni+i  n  •  •  •  n  o^)  3  l(ai  n  •  •  n  o,’_i  non  oj+i  n  •  ■  •  n  o„). 

D 

Proposition  36  (Transformation  7)  ijfl  ^  y  3  o  then  X{y)  3  1(a). 

D 

Proposition  37  (Transformation  8)  For  all  interpretations  X, 
2^(/(7)'(«))  =  mXi:fiXi,.,.,Xn)^a}). 

Proof:  The  proof  follows  easily  from  the  following  chain  of  equalities: 

2^(/(7)H«))  =  {»i :  /(wi ,  • . . ,  «„)  €  2:(o)} 

=  {p{Xi):p{f{Xu...,Xn))€X{a)} 

=  X({Xi:fiXi,...,Xn)Ga}).  Q 
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Proposition  38  (Transformation  9)  If  Si,...  ,'S^  and  a[,...,  are 
two  subsequences  of  ai  0  •••  n  such  that  each  a,-  is  in  one  of  the  two 
subsequences,  then,  for  all  interpretations  X, 

I(ain...naTO)  =  I(ai  n  . . .  n  n  5)  where  S  =  SiU-’-liSn- 


Proof:  The  proof  follows  easily  from  the  following  chain  of  equalities: 


each  se  £  S, 
) 


Ka^n  no^l  -  iv  Ai=i..n(v^2:(^e)for 

{V  4.  J(se)  for  each  se  6  5i  U 

{V  ^  l(se)  for  each  se  €  5  1 

i;€l(oin...na;„_„)  J 


U5„  I 


Proposition  39  (Transformation  10)  Ifl  is  an  interpretation  such  that 
for  each  j,  I  (Vjy,)  =  I(aij  n  •  •  •  n  a^j),  then 

i(ain-”no;„n5)  =  T(/(VjVj,...,Vjv„)n^) 


Proof;  The  proof  is  straightforward: 

i(ai  n---no„n5) 

~  T  ^/(oi,l>  •  •  •  J  fll.n)  n  •  •  •  n  /(Ob»,1>  •  •  •  >  ®»n,n)  H 

=  I  (/(ai,i  n  •••  n  Om.ij*- ->01,11  n  •••  n  am,n)  n  5) 

=  x(fiVN^,■..,VN„)r\s) 


Proposition  40(a)  (Transformation  11)  If  S  contains  no  elements  of 
the  form  /(■••)  then,  for  all  interpretations  X, 

X(/(ai, . . . ,  On)  n  =  T(^(ai, . . . ,  On)). 


Proof:  The  proof  follows  from  the  foUowing  chain  of  equalities: 
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I  ,  On)  n 

=  {v  :  17  e  I(/(ai, . . . ,  On))  and  v  ^  T{se)  for  all  5e  €  5} 

/(i7i,...,i7„)€l(/(oi,...,o„))and  1 
/(vi>  •  •  •  j ^n)  ^  for  all  56  G  5  / 

=  {f{vi,...,Vn):  /(i7i,...,t7„)el‘(/(ai,...,an))  } 

=  I(/(ai,...,a„)). 

The  first  equality  follows  from  the  definition  of  1.  The  second  step  fol¬ 
lows  from  the  fact  that  any  element  of  J(/(ai, . . . ,  a„))  must  have  the  form 
/(vi,...,t7n)<  Now  consider  the  third  equality.  Recall  that  a  constant  of 
the  form  5  is  such  that  5  is  a  set  of  atomic  set  expressions  of  the  form 
Now,  by  assumption,  S  does  not  contain  any  elements  of  the  form 
/(•••).  Hence,  5  contains  only  expressions  of  the  form  p('  •  •)  where  g  ^  f. 
It  follows  that  if  56  €  then  X(^se)  cannot  contain  any  elements  of  the  form 
/(t7i,...,i7n)5  and  so  the  condition  /(t7i,...,i7n)  ^  X{se)  for  all  56  €  5  is 
vacuously  true.  The  last  equality  again  follows  from  the  definition  of  X.  [] 

Proposition  40(b)  (Transformation  11)  IfX  is  an  interpretation  such 
that  for  each  j,  XfVsj)  =  X(aj  n  ScJ),  then 

I(/(ai, . . .  ,a,j)  n  ^)  =  xl  [J  /(oi,. . .  . . .  jOn)  n 

\»=l..n 


Proof;  The  proof  follows  from  the  following  chain  of  equalities: 
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=  T  •  •  •  >  On) 

=  {v  :  t;  €  2'(/(ai, . . . ,  a„))  and  v  ^  2’(/(5Ci , . . . ,  5C„))}  D  J(]^) 

. v.)iHf(set . »e,))  I 

{Vi  6  I(a,-)  for  t  =  l..n,  and  'i  _ 

f{vi, i,  1  <  i  <  R  I 

{Vi  €  I(ai)  for  *  =  l..n,  and  "j  _ 

/(vi,...,v„)  :  ^  >nl(50 

'  Vj  €  T(s€j)  for  some  j,  l<j<nf  '  ^ 

,  ,  r  ^  Vt  €  I(a,)  for  i  ^  j,  and  1  _ 

=  u  . «»):  ?ni(s') 

l  »)  €  I(“j  n  »ej)  J 

,  ,  f  «i  €  I(a,)  for  1 5^  and  1  _ 

=  U  j/K, ••.,»„):  [ni(50 

i=i..n  I  Wi€J(Viv^)  j 

=  (_J  ^  (/(flij •  •  •  VaTj-jOj+i,. . . ,On)^  n  1(5^) 

ial..n 


Proposition  41  (Transformation  12)  Let  the  application  o/ reduce  to 
X  "D  {X  :  s  ^v  h  conj}  result  in  the  constraints  X  D  seij...,X  D  sc„,  and 
let  T  be  an  interpretation. 


(a)  If  V  £  T{y)  and  {X  :  s  j  a  A  conj}  is  safe  with  respect  to  X 
then  X{{X  :  s  f  a  A  conj})  D  X(sei),  i  =  l..n. 

(b)  X  ({Jf  :  5  €  t>  A  conj)  C  I(sei  U  •  •  •  U  se„). 


Proof:  First  consider  (a).  Suppose  that  t>  £  X{y)  and  {X  :  s  f  ®  ^  conj}  is 
safe  with  respect  to  X  and  consider  the  following  chain  of  implications: 
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p  €  T{s  €  v)  implies  p(s)  ^  T{v) 
implies  p(s)  ^  {v} 
implies  v  ^  p(s) 
implies  3v*(v'  p(s)  At/  G  J(a) 
implies  p  G  T{s  f  a). 

This  proves  that  I(s  Gv)C  I(s  f  a)-  Moreover,  p  t>  (s  €  v)  iff  p  >  (s  t  ^)- 
Combining  this  with  the  assumption  that  {X  :  s  f  a  A  conj}  is  safe  with 
respect  to  X  establishes  the  preconditions  of  Proposition  30.  This  implies 
that  {X  :  s  6  u  A  conj}  is  safe  with  respect  to  X.  Hence,  Lemma  14  proves 
that  X{{X  :  s  t  a  A  conj})  D  I(se,),  t  =  l..n.  Now,  X{s  Gv)C  X{s  f  a)  also 
implies  that  X({X  :sfaA  conj})  D  X({X  :  s  G  vA  conj}).  It  follows  that 
X({X  :s}aA  conj})  D  T(se,),  t  =  l..n,  and  this  proves  (a). 

Finally,  part  (b)  follows  immediately  from  Lemma  14,  and  this  completes 
the  proof  of  the  proposition.  [] 

Proposition  42  (Transformation  13)  Let  X  be  an  interpretation.  If 
{X  :  s  t  a  A  conj}  is  safe  with  respect  to  I  and  I(o)  contains  more  than  one 
element  then  X({X  :  s  f  a  A  conj})  =  X({X  :  conj}) 


Proof:  If  p  >  5  then  p  6  X{s  f  a)  iff  3t;'(u'  ^  p(s)  A  t/  G  T(o)),  but  since 
X{a)  contains  more  than  one  element,  i/  can  always  be  chosen  to  be  different 
from  p(s).  Hence  p  G  X{s  f  a)  is  true  just  in  case  p  t>  s.  Therefore,  to  prove 
the  proposition  it  suffices  to  show  that 

p  G  X{conj)  implies  p  >  s 

and  this  implication  can  be  proved  as  follows.  Suppose  suppose  that  p  >  s 
does  not  hold.  This  implies  that  p  ^  J(staAconj).  Since  {X  :  sf  a  A  conj}  is 
safe  with  respect  to  J,  there  exists  a  quantified  condition  exp^  in  s  f  a  A  conj 
such  that  p  >  s  and  p  ^  X{expp).  Clearly  exp^  cannot  be  s  f  a  because  it 
has  just  been  shown  that  p  €  X{s  f  a)  iff  p  >  s.  Hence  exp^  must  appear  in 
conj  and  it  follows  that  p  ^  X(conj).  [] 

Proposition  43  (Transformation  14)  Let  X  he  an  interpretation  and 
let  conj  be  a  conjunction  of  quantified  conditions  in  compaction  form.  If 
I(fvf  conj)  is  non-empty  for  each  program  variable  Y  appearing  in  conj,  then 
X(^conj)  =  X({X  :  conj}). 
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Proof:  Let  Xi,...,Xn  be  a  listing  of  the  program  variables  appearing  in 
{X  :  conj}  and  let  X  be  Xk.  Since  conj  is  in  compaction  form,  each  of  its 
quantified  condition  is  of  the  form  Xi  €  a  where  a  is  atomic.  For  each  i, 
let  Xi  €  0,1,1,..  .,Xi£  a,-,nj  be  the  quantified  conditions  in  conj  of  the  form 
Xi  €  a.  Now  consider  the  following  chain  of  containments  and  equalities: 


I  ({X  :  conj}) 


{piXk)  :  P  e  I(oonj)} 

{p(Xk) :  Vi  (p(Xi)  e  n  •  •  •  n 
{piXk):^i{piXi)eI{conjx,))} 
{p(Xjt)  :  piXk)  e  I(conjxj} 
Xi^conj). 


The  fifth  equality  (which  removes  the  universal  quantifier)  follows  from  the 
fact  that  each  a.-,!  n  •••  n  a, is  non-empty  in  lm{explicit(C))  and  since 
[m(explicit(C))  C  T,  this  implies  that  Z(oj,i  n  •  •  •  D  a,-,nj  ^  {}.  [] 


We  now  use  these  basic  propositions  to  prove  that  the  transformation 
are  sound  (that  is,  they  prese'.we  the  least  model).  Note  that  a  number  of 
the  propositions  have  side  conditions  relating  to  intersection  variables  and 
safeness.  We  therefore  prove  soundness  in  conjunction  with  two  invariants. 


Invariant  4  [Intersection  Variable  Invariant]  C  satisfies  the  intersection 
variable  invariant  if  Vff  €  var(C)  implies  that  lm(C)  |=  Vat  =  (f)  X).  [j 

Invariant  5  (Safeness  Invariant)  C  satisfies  the  safeness  invariant  if  C 
is  safe  vnth  respect  to  lm{C).  [] 

The  key  part  of  the  proof  that  Aj  is  sound  is  the  following  proposition,  which 
says  that  the  transformations  are  sound  if  both  invariants  are  satisfied,  and 
moreover,  that  the  transformations  preserve  the  invariants. 


Proposition  44  Let  C  be  a  collection  of  constraints  that  satisfies  the  in¬ 
tersection  variable  invariant  and  the  safeness  invariant.  Then,  for  each 
transformation  S  in  A2, 
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1.  lm[C  U  8{C))  =,.,(c)  lrn{C); 

2.  CU  6{C)  satisfies  the  intersection  variable  invariant,  and 

3.  CU6{C)  satisfies  the  safeness  invariant 


Proof  (No  New  Variables):  This  case  covers  all  transformation  appli¬ 
cations  except  those  that  may  introduce  new  variables.  Specifically,  it  ex¬ 
cludes  Transformation  10  and  case  (b)  of  lYansformation  11.  First  consider 
part  (1).  Since  S  does  not  introduce  new  variables,  part  (1)  reduces  to 
lm{CuS{C))  =  lm(C).  It  is  easy  to  verify  that  /m(CU^(C))  2  lm{C)  (Propo¬ 
sition  50  in  Appendix  I  contains  a  proof  of  this  in  a  very  general  setting). 
Hence  it  suffices  to  show  that  lm(C)  is  a  model  of  8{C).  Let  X  denote  lm{C), 
and  since  C  satisfies  the  safeness  invariant,  this  implies  that  each  quantified 
set  expression  in  C  is  safe  with  respect  toJ.  It  follows  that  the  preconditions 
of  Propositions  34-38,  40(a)  and  41-43  (corresponding  to  Transformations 
5-9,  case  (a)  of  Transformation  11, .  nd  Transformations  12-14  respectively) 
are  satisfied.  These  propositions  imply  that  8{C)  consists  of  a  collection  of 
constraints  A*  D  scj,  ...,  A*  D  sc„,  n  >  1,  such  that  there  is  a  constraint 
of  the  form  A  D  se  in  C  and  J(sc)  D  X(s€i),  i  —  l..n.  Now,  since  A  D  se 
appears  in  C,  it  must  be  the  case  that  I(A')  D  I{se).  Hence  I(A')  D  I(se,), 
i  =  l..n,  and  this  proves  the  I  is  a  model  of  8(C),  and  completes  the  proof 
of(l). 

Consider  part  (2).  Since  all  intersection  variables  in  C  U  ^(C))  appear  in 
C,  part  (2)  immediately  follows  from  part  (1)  and  the  fact  that  C  satisfies 
the  intersection  variable  invariant. 

Consider  part  (3)  of  the  proposition.  By  assumption,  each  quantified  set 
expression  in  C  is  safe  with  respect  to  lm(C).  Since  lm(C)  =  /m(CU^(C)),  it 
follows  that  these  same  quantified  set  expressions  are  also  safe  with  respect 
to  lm(C  U  j(C)).  It  remains  to  consider  the  new  quantified  set  expression 
introduced  by  8(C).  The  only  transformations  that  can  introduced  new 
quantified  set  expressions  are  5,  8,  12  and  13.  Let  X  denote  lin(C  U  8(C)) 
and  consider  each  of  these  transformations  in  turn. 

In  the  case  of  Transformation  5,  {X  :  (s  €  y)^conj}  is  safe  with  respect 
to  X  because  C  is  assumed  to  be  safe  with  respect  to  X.  Moreover,  since  J  is  a 
model  of  C,  X(y)  D  1(a),  and  so  J(s€o)  C  X(s^y).  Combining  this  with 
the  fact  that  p  >  (s  €  o)  iff  p  >  (s  €  3^),  establishes  the  three  preconditions  of 
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Proposition  30.  Hence  {X  :  s€a  A  conj)  is  safe  with  respect  to  J.  Finally, 
by  Lemma  14,  it  follows  that  ^(C)  =  REDUCe(A’  2  {JT  :  (s  €  o)  A  conj})  is 
safe  with  respect  to  J. 

In  the  case  of  Transformation  8,  it  is  clear  that  the  quantified  set  ex¬ 
pression  {.Yj  :  f(Xi,...,Xn)  €  a}  is  safe  with  respect  to  X  because  p  > 
f(Xi,....,Xn)  for  all  environments  p.  Hence,  S(C)  =  reduce(<1:'  2  {Xi  : 
f(Xi, . .  .,Xn)  €  a})  is  safe  with  respect  to  X,  by  Lemma  14. 

In  the  case  of  Transformation  12,  {JT  :  (s  f  o)  A  conj}  is  safe  with  respect 
to  X  because  C  is  assumed  to  be  safe  with  respect  to  X.  Also,  since  X  D 
lm(explicit(C))  it  follows  that  X(a)  D  X(v).  Using  this,  it  is  easy  to  verify 
that  X(s€v)  C  X(s  f  a).  Combining  this  with  the  fact  that  p  >  (s€v) 
iff  p  [>  (s  t  a),  establishes  the  three  preconditions  of  Proposition  30.  Hence 
{X  :  sevAconj}  is  safe  with  respect  to  X.  Finally,  by  Lemma  14,  it  follows 
that  S(C)  is  safe  with  respect  to  X. 

Consider  Transformation  13.  The  only  new  quantified  set  expression  in 
S(C)  is  {X  :  conj}.  Since  X  D  lm(explicii(C))y  it  follows  that  X(a)  must 
contain  at  least  two  elements.  If  p  t>  s,  then  by  definition,  p  6  X(s  f  a)  iff 
3u'(u'  ^  p(s)  A  u'  €  F(a))y  but  since  1(a)  contains  more  thzm  one  element, 
v'  can  always  be  chosen  to  be  different  from  p(s).  Hence,  if  p  t>  s  then 
p  €  X(s  t  a).  Now,  suppose  that  p  0  X(conj).  Then  p  ^  X(s  f  o  A  conj). 
Since  C  is  assumed  to  be  safe  with  respect  to  X,  it  follows  that  there  exists 
a  quantified  set  expression  exp^  in  s  f  a  A  conj  such  that  p  t>  ezpp  but 
p  ^  X{expp).  If  expp  is  s  t  a,  then  p  >  s  and  p  ^  J(s  f  o),  but  we  have  just 
proved  that  this  is  not  possible.  Hence  expp  must  appear  in  conj  and  this 
completes  the  proof  that  S{C)  is  safe.  [] 

Proof  (Intersection  Variables):  This  case  covers  transformation  appli¬ 
cations  that  may  introduce  new  variables,  that  is.  Transformation  10  and 
case  (b)  of  Transformation  11.  The  proof  for  these  transformations  builds 
on  the  a  similar  proof  in  Proposition  25  (page  187).  Some  of  the  follow¬ 
ing  material  is  a  duplication  of  material  from  Proposition  25,  however  it  is 
repeated  because  this  proof  is  a  key  part  of  the  correctness  argument. 

Consider  the  intersection  variables  V;v, , . . . ,  Vjv„  mentioned  by  Transfor¬ 
mations  10  and  11.  For  each  Vn^,  j  —  l..n,  either  appears  in  var{C) 
or  else  a  constraint  Vffj  2  oi  j  n  •  •  •  n  0^,7  is  included  in  6{C)  such  that 
Nj  =  U«=:i..m-^(®tj)-  Note  that  each  o,j  appears  in  C.  Using  this  fact,  an 
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interpretation  J  can  be  defined  as  follows: 

_  f  /m(C)(ai j  n  •  •  •  n  Omj)  HX  ^  var(C)  and  X  is  Vj^. 

'  I  lm(C)(X)  otherwise 

That  is,  J  extends  lTn(C)  to  the  new  intersection  variables  such  that  J 
satisfies  each  constraint  Vjv,  3  aij  n  •  •  •  n  Omj  that  appears  in  S(C).  The 
main  part  of  the  proof  for  this  case  is  that  I  =  lm(C  U  ^(C)). 

Consider  each  of  the  set  expressions  Oij.  As  has  already  been  noted, 
each  a,  j  appears  in  C,  and  this  can  be  used  to  prove  that  each  a,  j  satisfies 
the  following  equation: 

(7.30) 

To  prove  this,  observe  that  if  a,j  is  an  intersection  variable,  then  (7.30) 
follows  from  the  intersection  variable  invariant  for  C,  and  if  a,-j  is  not  an 
intersection  variable  then  n'^(®ij)  is  just  Ojj.  Equation  (7.30)  can  be  used 
to  prove  the  following  chain  of  equalities 

I(aij  n  •••na,nj)  =  n  •  •  •  nX{amj) 

=  i(n^(aM))n...ni(n>v'(o,„j)) 

=  l(n^'(aw)U-UAr(o,„j)) 

= 

which  proves  that,  for  all  j,  X(aij  n  •  •  •  n  o^j)  =  X{r\Nj).  Now,  if  Vn, 
is  introduced  by  S,  then  X{Vff-)  =  X(aij  n  •••  D  Omj)  by  definition  of  X. 
On  the  other  hand,  if  Va/^  appears  in  C  then  T(V/v^)  =  X(r\Nj)  because  C 
satisfies  the  intersection  variable  invariant.  Hence,  for  j  =  l..n, 

x(yNj)  =  2-(n^i)  =  3:(awn...nc,nj)  (7.3i) 

The  main  use  of  (7.31)  is  to  prove  that  J  is  a  model  of  tf(C).  Recall 
the  definitions  of  Transformations  10  and  11.  In  both  cases,  there  is  a 
constraint  X  "D  se\n  C  such  that  the  constraints  in  S(C)  are  either  of  the 
form  (a)  X  D  se'  or  (b)  V^y  D  Oi  jH*  •  •OotoJ  where  Vjv,  is  a  new  intersection 
variable.  By  definition,  J  is  a  model  of  the  constraints  in  (b).  Consider  a 
constraint  X  D  se'  in  (a),  and  recall  Propositions  39,  40(a)  and  40(b).  It  is 
clear  from  (7.31)  that  X  satisfies  the  preconditions  of  these  transformations. 
It  follows  that  X(se)  D  X{se’).  Moreover,  X  is  &  model  of  C,  and  so  X{X)  D 
X{se)  D  X{se').  Hence  I  is  a  model  of  A'  D  se.  This  completes  the  proof 
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that  J  is  a  model  of  S(C). 

X  is  not  only  a  model  of  C  U  S(C),  it  is  in  fact  the  least  model.  To  see 
this,  let  X'  be  an  arbitrary  model  of  C  U  S(C).  If  A'  is  a  variable  that  is 
different  from  the  Vn^  introduced  at  this  step,  then  X\X)  D  X{X)  because 
X*  2  lm{C)  and  lm(C)  =  X(X)  by  definition  of  X.  On  the  other  hand,  for 
the  variables  V/v^  introduced  at  this  step,  consider  the  foUowing  chain: 

2  T\aijr\’“f\ayaj)  2  T(aij  n  •••  noroj)  =  T(Va/^). 

The  first  containment  follows  because  X‘  is  a  model  of  S(C).  The  second 
is  because  aij  D  •  ■  •  0  contains  only  variables  from  C,  and  it  has  just 
been  proved  that  X'{X)  D  X{X)  for  variables  X  G  var(C).  The  final  equality 
follows  from  the  definition  of  J.  This  completes  the  proof  that  Im(CuS(C))  D 
X.  Combining  this  with  X  3  /tn(C  U  ^(C))  proves  that  X  =  lm{C  U  ^(C)). 

Now,  by  definition,  X  agrees  with  lm{C)  on  var{C),  and  so  lm{C)  =,„(€) 
lm(C  U  S(C)).  This  proves  part  (1)  of  the  proposition.  To  prove  part  (2), 
we  need  to  show  that  X(Vf/)  =  I(niV)  for  all  intersection  variables  Vjv 
appearing  in  C.  Suppose  that  Vw  €  t>ar(C).  Since  C  satisfies  the  intersection 
invariant  and  X  agrees  with  lm(C)  on  var(C), 

T(Vn)  =  lm(C)(VN)  =  ^m(C)(niV)  =  1(0^'^)- 

On  the  other  hand,  if  Vjv  is  introduced  by  S(C),  then  X(Vff)  =  X(f)N) 
follows  from  (7.31). 

It  remains  to  show  that  C  U  S(C)  satisfies  the  safeness  invariant.  Now, 
none  of  the  transformations  considered  introduce  new  quantified  set  expres¬ 
sions.  Hence  each  quantified  set  expression  in  CU  ^(C)  is  safe  with  respect  to 
lm{C).  Moreover,  these  quantified  set  expression  only  involve  variables  from 
var(C).  Since  lm(C)  and  lm{C  U  S{C)  agree  on  var[C),  it  is  easy  to  verify 
that  C  U  S(C)  is  safe  with  respect  to  lm(C  U  ^(C)).  Hence  C  U  S(C)  satisfies 
the  safeness  invariant.  [] 

We  have  already  been  shown  that  Co  =  REDVCe(standardize(SCp)) 
satisfies  the  safeness  invariant.  Also,  it  is  clear  that  Co  satisfies  the  intersec¬ 
tion  variable  invariant  (since  it  does  not  contain  any  intersection  variables). 
Hence,  the  previous  proposition  directly  implies  that: 
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Lemma  16  (Soundness)  A2  is  sound  on  each  Ci  constructed  by  the  algo¬ 
rithm. 


Proof:  Clearly  Cq  satisfies  the  safeness  invariant  and  the  intersection  vari¬ 
able  invariant.  Repeated  application  of  Proposition  44  proves  that  each  Ci 
constructed  by  the  algorithm  satisfies  both  of  these  invariants.  It  also  proves 
that  each  Ci  is  sound  on  C,-.  [] 

We  now  prove  that  the  algorithm  terminates.  We  begin  by  showing  that 
A2  is  atomically  bounded. 


Lemma  17  (Atomically  Bounded)  A2  is  atomically  bounded. 


Proof:  The  proof  proceeds  by  establishing  a  bound  on  the  atomic  set  ex¬ 
pressions  that  can  be  introduced  by  the  algorithm.  It  has  already  been 
established  in  Proposition  33  that  each  Ci  constructed  by  the  algorithm  sat¬ 
isfies  the  atomic  set  expression  invariant.  This  implies  that  each  atomic  set 
expression  introduced  by  the  algorithm  dther: 

(i)  appears  in  atomic{Coy, 

(ii)  is  introduced  by  an  application  of  IVansformation  12; 

(iii)  is  of  the  form  /(oi, . . . ,  On)  where  /  is  a  function  symbol  appearing  in 
Co,  and  each  a,-  is  either  an  intersection  variable  or  a  strict  subterm  of 
some  atomic  set  expression  that  falls  into  cases  (i)  or  (ii);  or 

(iv)  is  of  the  form 'S  such  that  5  C  atomic{Si  U  •  •  -  U  Sn)  for  some  comple¬ 
ment  constants  . . .  ,^  satisfying  cases  (i)  or  (ii)  of  the  invariant. 


Now,  the  set  of  atomic  set  expressions  that  satisfy  (i)  is  fixed.  Moreover, 
the  atomic  set  expressions  that  may  satisfy  (iii)  and  (iv)  are  essentially  de¬ 
termined  by  those  that  satisfy  (i)  and  (ii).  The  critical  item  is  therefore  (ii), 
since  Transformation  12  can  introduce  completely  new  atomic  set  expres¬ 
sions.  When  this  transformation  is  applied  to  reduced  form  constraints  C, 
the  effect  is  to  add  constraints  reduce(<1'  2  {X  :  s  €v  A  conj})  such  that 
X  D  {X  :  s  t  a  A  conj}  is  a  constraint  in  C  and  lm{explicit(C))(a)  =  {v}. 
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Recall  the  steps  of  REDUCE  (see  Figure  7.5,  page  199).  Clearly  these 
steps  do  not  affect  the  quantified  conditions  in  conj  since  conj  is  in  reduced 
form.  Hence,  the  application  of  reduce  to  X  D  {X  :  s  €  v  A  conj}  can 
only  involve  applications  of  steps  to  the  quantified  condition  s  €  v  and  any 
new  quantified  conditions  produced  by  such  steps.  In  other  words,  all  new 
quantified  conditions  in  reduce(A'  2  {X  :  s  6  v  A  conj})  can  be  traced 
back  to  s  €  V.  Thus  the  new  quantified  conditions  in  REDUCE(A’  D  {X  :  5  € 
V  A  conj})  are  the  same  as  the  new  quantified  conditions  in  reduce(A:’  D 
:  s  €  w}). 


Moreover,  any  new  atomic  set  expressions  introduced  by  this  application 
of  Transformation  12  must  be  contained  in  the  new  quantified  conditions. 
It  follows  that  all  new  atomic  set  expressions  introduced  by  this  application 
of  Transformation  12  are  contained  in  the  set  ATM defined  by: 


ATM 


»,v 


|a  €  atomic(a') : 


y  €  a'  is  a  constrmnt  in 
REDUCE  (A'  2  {A" :  s  e  F}) 


Importantly,  if  there  is  another  constraint  of  the  form  A  2  ^  t  A 

conj*}  in  C,  then  the  atomic  set  expressions  introduced  by  an  application  of 
Transformation  12  to  this  constraint  are  also  contained  in  ATM,,^. 


Now,  consider  the  expressions  s  f  a.  It  is  easy  to  verify  that  no  trans¬ 
formation  introduces  new  expressions  of  this  form.  Hence  the  only  ex¬ 
pressions  of  the  form  s  f  a  that  appear  during  the  algorithm  are  those 
that  appear  in  Cq.  This  means  that  the  new  atomic  set  expressions  in¬ 
troduced  by  Transformation  12  are  dependent  on  a  finite  set  quantified 
conditions  s  f  a,  and  the  possible  values  v  such  that  lm{explicit{C))  = 
{u}  for  some  collection  of  constraints  C  constructed  during  the  algorithm. 
Now,  consider  the  sequence  of  constraints  Co,Ci, . . .  constructed  by  the  algo¬ 
rithm.  Since  these  are  an  increasing  sequence  of  collections,  it  follows  that 
lm{explicit(Co)),lm(explicit(Ci)),. . .  is  an  increasing  sequence  of  interpre¬ 
tations.  Hence,  for  any  atomic  set  expression  a,  if  lm{explicit{Ci)){a)  and 
lm{explicit(Cj)){a)  are  both  singleton  sets,  then  it  must  be  the  case  that 
lm{explicit(Ci))(a)  =  lm{explicit{Cj)){a).  This  means  that  for  each  a,  there 
is  exactly  one  value  of  v  such  that  lfn{explicit{Ci)){a)  —  {v}.  Denote  this 
value  (if  it  exists)  by  Va.  A  key  consequence  of  this  is  that  if  an  applica¬ 
tion  of  Transformation  12  occurs  during  the  algorithm  and  the  application 
involves  the  quantified  condition  s  f  a,  then  any  new  atomic  set  expressions 
introduced  by  this  application  must  be  contained  in  ATM(s,Va)  where  Va 
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is  the  value  associated  with  a.  In  summary  then,  the  atomic  set  expressions 
introduced  by  Transformation  12  must  be  contained  in  the  finite  set 

{a  :  a  €  ATM{s,Va)  and  s  f  a  appears  in  Co} 

Now,  consider  the  union  of  this  set  with  atomic(Co)-  This  combined 
set  contains  all  of  the  atomic  set  expressions  that  can  be  introduced  during 
algorithm  execution  that  satisfy  part  (i)  or  (ii)  of  the  atomic  set  expression 
invariant.  Let  K  be  the  cardinality  of  this  combined  set.  Now,  the  atomic 
set  expressions  that  satisfy  parts  (iii)  and  (iv)  of  the  atomic  set  expression 
invariant  are  essentially  expressions  that  are  derived  from  (i)  and  (ii),  and 
these  can  be  bounded  using  the  bounds  on  (i)  and  (ii)  as  follows. 

First  consider  the  atomic  set  expressions  that  satisfy  (iii).  These  are 
of  the  form  /(ai,...,an)  where  each  Oi  appears  in  (i)  or  (ii)  or  is  an  in¬ 
tersection  variable.  Now,  intersection  variables  are  of  the  form  Vn  such 
that  N  is  some  set  of  atomic  set  expressions.  Moreover,  by  inspection  of 
Transformations  10  and  11  (these  are  the  only  transformations  that  intro¬ 
duce  intersection  variables),  N  contains  only  atomic  set  expressions  Oj  such 
that  /(ai,...,an)  is  an  atomic  set  expressions  that  appears  at  some  stage 
during  the  algorithm.  It  follows  that  the  number  of  intersection  variables 
is  bounded  by  2^.  Hence  the  number  of  atomic  set  expressions  that  satisfy 
(iii)  is  boimded  by  F.{K'y*  where  F  is  the  number  of  function  symbols  ap¬ 
pearing  in  Co,  n  is  the  maximum  arity  of  a  function  symbol  in  Co,  and  K'  is 

Finally,  consider  the  atomic  set  expressions  that  satisfy  (iv).  These  are 
of  the  form  S  such  that  there  are  constants  that  fall  into  cases 

(i)  or  (ii),  and  5  C  atomic{Si  U  *  • '  U  5,i).  Now,  a  finite  bound  has  already 
been  established  for  the  number  of  atomic  set  expressions  introduced  during 
the  algorithm  that  satisfy  parts  (i)  and  (ii)  of  the  atomic  set  expression 
invariant.  Hence  the  set 

atoTnic{{a  :  a  €  5  and  'S  satisfies  (i)  or  (ii)}) 

is  finite.  Let  it  have  cardinality  K".  Then  the  number  of  atomic  set  expres¬ 
sions  introduced  by  the  algorithm  that  satisfy  (iv)  can  be  bounded  by  2^". 
This  completes  the  proof  that  only  a  finite  number  of  different  atomic  set 
expressions  may  be  introduced  by  the  algorithm.  [] 
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Lemma  18  (Termination)  Let  Co  he  a  collection  of  constraints  in  reduced 
form  and  standard  form.  Then  the  instance  of  the  generic  algorithm  defined 
by  A2  terminates  on  Cq. 

Proof:  The  proof  proceeds  by  establishing  a  bound  on  the  number  of  dis¬ 
tinct  construnts  that  may  be  introduced  by  the  algorithm.  Since  the  con¬ 
straints  encountered  during  the  algorithm  are  in  standard  form,  they  must 
all  have  one  of  the  following  forms: 

(»)  .V  2  /,-)■(«), 

(b)  X  D  {X  :  expi  A  •  •  •  A  exp^}, 

(c)  A'  2  n  •  •  •  n  a„,  fi  >  2  or 

(d)  XDa, 

where  a,ai, . . .  ,an  are  atomic  set  expressions,  A'  is  a  set  variable,  and  f^^ 
is  a  projection  symbol,  and  expi^ . . . ,  exp^  are  quantified  conditions.  Now, 
consider  these  four  kinds  of  constraints  in  turn. 

In  case  (a),  note  that  no  transformation  introduces  constraints  involving 
projection  operations.  Hence  any  constraint  in  this  class  must  in  fact  ap¬ 
pear  in  Co,  and  this  places  a  trivial  bound  on  the  number  of  these  kinds  of 
constraints  that  can  be  encountered  during  the  algorithm. 

Now  consider  case  (b).  Each  exp^  is  mther  of  the  form  s  €  a  or  3  f  a  such 
that  3  is  a  program  term  and  a  is  an  atomic  set  expression.  A  bound  has 
already  been  established  on  the  number  of  possible  atomic  set  expressions. 
Now  focus  on  the  program  terms  3.  It  has  already  been  observed  that  the 
only  transformations  capable  of  introducing  new  quantified  set  expressions 
are  5,  8,  12  and  13.  Observe  that  Transformations  5,  12  and  13  only  in¬ 
troduce  subterms  of  program  terms  already  appearing.  Specifically,  when 
one  of  Transformations  5,  12  and  13  is  applied  to  constraints  C,  then  any 
new  quantified  set  expression  must  be  of  the  form  3  €  a  or  3  f  a  such  that  C 
contains  a  quantified  condition  s*  €  o'  and  3  is  a  subterm  of  3'. 

On  the  other  hand.  Transformation  8  may  introduce  completely  new 
program  terms.  Spedfically,  this  transformation  introduces  a  program  term 
f(Xi,...,Xn)  for  each  occurrence  of  a  constraint  falling  into  case  (a).  As 
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noted  previously,  it  is  assumed  that  the  Xi,...,Xn  are  chosen  in  some 
canonical  manner  (for  example,  using  some  fixed  listing  of  var)  so  that  this 
transformation  cannot  be  repeatedly  applied  to  a  constraint  X  2  /w'W  to 
produce  reduce(A'  D  {X,-  :  f{Xu...,Xn)  €  a}),  and  then  reduce(A'  D 
:  f{X[,...,Xn)  €  a}),  etc.  Hence  there  is  exactly  one  application  of 
Transformation  8  for  each  occurrence  of  a  constraint  that  falls  into  case 
(a).  Clearly  the  subsequent  application  of  reduce  (and  subsequent  appli¬ 
cations  of  other  transformation)  may  introduce  subterms  of  f(Xi, . . .  ,Xn)- 
To  summarize  then,  each  constraint  X  3  /(7/(a)  nxay  introduce  a  fi¬ 
nite  number  of  new  program  terms,  namely  f{Xi,. .  .,Xn),  Xi,..  .,Xn-  It 
follows  that  there  is  a  bound  on  the  number  of  program  terms  introduced 
by  Transformation  8. 

This  means  that  there  is  a  bound  on  the  number  of  distinct  program 
terms  that  may  be  encountered  during  the  algorithm.  Combining  this  with 
the  bound  atomic  set  expressions,  proves  that  the  number  of  quantified 
conditions  s  €  a  and  s  f  a  is  bounded.  Since  each  conj  is  maintained  in 
a  non-redundant  form,  it  follows  that  there  is  a  bound  on  the  number  of 
conjunctions  exp^  A  •••  A  txp^.  This  implies  a  bound  on  the  expressions 
{X  :  expi  A  •  •  •  A  exp^}  because  the  program  variable  X  must  either  appear 
in  C  or  be  introduced  by  an  application  of  Transformation  8.  This  in  turn 
implies  the  existence  of  a  bound  on  the  number  of  constraints  of  the  form 
X  D  {X  :  expi  A  •  •  •  A  exp^}  that  can  be  introduced  by  the  algorithm  since 
there  is  a  bound  on  the  number  of  set  variables  X  (set  variables  are  atomic 
set  expressions). 

Finally,  consider  cases  (c)  and  (d).  Since  each  constraint  of  the  form 
/V  3  oi  n  •  ■  •  n  a„  is  such  that  Co  contains  an  intersection  operator  of  arity 
m  >  n,  the  bound  on  atomic  set  expressions  implies  a  bound  on  the  number 
of  such  constraints.  Similarly,  a  bound  may  be  established  on  the  number 
of  constraints  of  the  form  X  Da. 

This  completes  the  proof  that  the  algorithm  can  only  introduce  a  finite 
number  of  distinct  constraints,  and  since  Co,Ci,C3,...  is  an  increasing  se¬ 
quence  of  constraints,  it  follows  that  at  some  stage  it  must  be  the  case  that 
Cn+l  =  Cm  and  so  the  algorithm  terminates.  [] 

It  remains  to  prove  that  A2  is  complete. 


Lemma  19  (Completeness)  As  is  complete. 
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Proof:  The  proof  of  completeoess  has  the  same  basic  structure  as  the 
completeness  proof  for  the  intersection-projection  algorithm.  Some  of  the 
cases  (particularly  those  dealing  with  intersection)  are  just  adaptations  of 
this  previous  proof.  Let  C  be  the  result  of  exhaustively  applying  A2  to  a 
collection  of  constraints.  This  implies  that  6{C)  C  C  for  all  transformations 
^  in  A2.  Adopting  the  notation  of  the  generic  algorithm,  let  the  sequence  of 
constraints  obtained  by  this  exhaustive  application  be  Co,Ci,...,Ci  where 
Ci  =  C.  Let  V  denote  the  subset  of  constraints  in  C  of  form  Af  D  a  where  a 
is  a  non-variable  atomic  set  expression.  Clearly  T>  C  explicit(C)  C  C,  and  so 
im(D)  C  lm{explicit{C))  C  lm{C).  The  remainder  of  the  proof  shows  that 
lm{D)  D  lm{C),  and  it  is  clear  that  this  implies  lm{explicit(C))  =  lrn{C),  as 
required  by  the  definition  of  completeness. 


To  prove  that  lm{V)  D  lm{C),  we  shall  show  that  lm{V)  is  a  model  of  C. 
Let  1t)  denote  /m(2>).  Proposition  17  provides  the  following  characterization 
of  Td: 


v€ip(Ar)  iff 


V  €  Ti>{a)  for  some  constraint  D  a  in  C 
where  a  is  non-variable  atomic  set  expression 


(7.32) 


The  reminder  of  the  proof  uses  this  fact  to  show  that  Ip  is  a  model  of 
C.  Consider  each  possible  constraint  in  C  in  turn: 


Case  (i):  Consider  a  constraint  of  the  form  A!  D  a  where  a  is  an  atomic  set 
expression.  The  proof  here  is  identical  to  that  in  Lemma  13  (completeness 
of  the  intersection-projection  algorithm). 

CsLse  (ii):  Consider  a  constraint  of  the  form  A  D  <*1  n  •  ■  •  fl  Om  where  each 
a,-  is  an  atomic  set  expression  and  m  >  2.  The  proof  is  by  induction  on 
V  and  the  induction  hypothesis  is:  for  all  values  v  and  for  all  constraints 
A'  D  Cl  n  •  •  •  n  Om,  Tn>2,  appearing  in  C, 


(a)  t;  €  Iv{a\  D  •  •  •  n  Um)  implies  t;  6  Jp{X),  and 

(b)  if  A'  2  ui  n  •  •  ’  n  a,„  is  introduced  by  an  application  of  Transformation 
10  or  11  then  v  €  Id(A')  implies  v  €  Iz)(oi  n  •  •  •  n  Om). 

Let  V  be  a  value  such  that  the  induction  hypothesis  holds  for  all  values 
with  fewer  function  symbols  that  v.  Before  considering  (a)  and  (b),  it  is 
convenient  to  first  prove  the  following  statement:  if  v'  has  fewer  symbols 
that  V  and  oi, . . . , appear  in  C,-  then 
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v'  e  n  •  •  •  n  a*)  iff  A  ^  where  JV  =  [J  Af{aj)  (7.33) 

o6JV  j=\..k 

This  is  proved  by  a  secondary  induction  on  i.  Suppose  that  (7.33)  holds  for 
all  t'  <  i.  Let  ai,...,a;b  appear  in  C,-  and  consider  the  following  chain  of 
propositions. 

v'  €  Tv{p-x  n  •  •  •  n  Om)  iff  A  ^ 

if!  /\  /\  v'elvia) 

iff  A  ^  ^v(o)  where  JV  =  [J  Af{aj) 

a€N  j=l..k 

The  first  step  is  just  an  expansion  of  H.  For  the  second  step,  take  each 
Uj  in  turn  and  consider  two  cases,  ff  aj  is  not  an  intersection  variable 
then  M{aj)  =  {aj}  and  so  the  second  step  is  trivial.  On  the  other  hand, 
suppose  that  aj  is  an  intersection  variable,  say  V;v^  .  Corresponding  to  Vjy^, 
there  exists  a  constraint  V/Vy  D  aJ  n  •  •  •  n  a{,  /  >  2,  that  is  introduced  by 
Transformation  10  or  11.  Moreover,  this  constraint  must  appear  in  Ci-i 
and  Nj  =  Af{a[)  U  •••  UAl{ai).  Now,  since  v'  is  smaller  than  v  and  C<_i 
is  constructed  before  €%,  the  main  induction  hypothesis  and  the  secondary 
induction  hypothesis  respectively  imply  that 

v'  e  Jz>(oi  n  •  •  •  n  o|)  iff  t/  €  and 

V*  €  2p(oi  n  •  •  •  n  a{)  iff  /\  v'  ^  Tv{o) 

and  the  second  step  follows  immediately.  The  final  step  in  the  chain  follows 
from  the  definition  of  N.  This  completes  the  inductive  proof  of  (7.33).  The 
following  key  property  is  an  immediate  corollary  of  (7.33):  if  w'  has  fewer 
symbols  than  v,  and  Va^, oi, . . . , o*  appear  in  C,-  where  N  =  Af{ai)  U  •  •  •  U 
Af(ak),  then 

v'  €  Tt)(Vn)  iff  t/  €  T-p{ai  n  •  •  •  n  a*)  (7-34) 

Now  consider  part  (a)  of  the  main  induction  hypothesis.  Assume  that 
V  e  Ip(ai  n  •  •  •noro).  it  follows  that  V  €  o,-,  i  =  l..m.  Now,  if  one  of  the  a,-  is 
a  set  variable,  say  y,  then  (7.32)  implies  that  there  exists  a  constraint  yo  a 
in  C  where  a  is  a  non-variable  atomic  set  expression  such  that  v  6  Ivio). 
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This  means  that  the  preconditions  of  Transformation  6  are  satisfied,  amd  so 
the  constraint  A*  D  Oi  n  •  •  •  n  a,_i  non  o,+i  n  •  •  •  n  a,n  nmst  appear  in  C. 

This  argument  may  be  repeated  if  necessary,  and  it  follows  that  C  must 
contain  a  constraint  of  the  form  X  D  oi  0  •  •  •  D  Om  where  each  a,-  is  a 
non-variable  atomic  set  expression  and  v  G  n  •  •  •  n  Cm)-  Now,  Trans¬ 
formation  9  can  be  applied  to  this  constraint,  and  since  this  application  does 
not  produce  any  new  constraints,  it  must  be  the  case  that  C  contains  a  con¬ 
straint  of  the  form  A'Dain^-naronS  such  that  v  6  lD(ai  n  •  •  •  n  0^  n  5) 
and  each  a,-  is  of  the  an  atomic  set  expression  that  is  not  a  set  variable 
or  a  complement  constant.  Let  v  be  /(vi,...,v„).  It  follows  that  each 
a,-  must  be  of  the  form  such  that  Vj  €  »  =  l..m, 

j  =  l..n.  This  implies  that  vj  €  n  •••  n  Om^)-  Since  C  contains 

ail  constraints  generated  by  Transformation  10,  it  follows  that  C  contains 
X  2  /(VV, , . . . ,  VV„)  n  5  such  that  Nj  =  JV(ai  j)  U  •  •  •  U  A[iamj),  3  =  I-"- 
By  (7.34),  Vj  G  Hence  v  G  2p(/(Vjv, , . . . ,  Vjv„)  D  5). 

Hence,  C  must  contain  a  constraint  of  the  form  X  D  /(ai, . . . , a„)  D  S 
such  that  /(ui,...,Vn)  €  I(/(ai,...,On)  n  S).  In  fact  C  may  contain  a 
number  of  constraints  of  this  form.  Pick  the  constraint  that  minimizes  the 
cardinality  of  the  set  5.  Now,  Transformation  10  cam  be  applied  to  this 
constraint.  Suppose  that  5  contains  an  element  of  the  form  . .  ,5e„). 

This  implies  that  C  must  contain  the  constraints 

X  2  •  •  • » '  * '  >®w)  ^  yj  ~  l..n 

where  5'  is  5  -  {/(«ei,...,scn)}  and  Nj  is  M{aj)  U  {sej},  3  =  l..n.  Since 
f{vi , . . . ,  v„)  G  Tv(S),  it  must  be  the  case  that 

/(t?!, . . . ,  Vn)  ^  2l)(y(a€i,  ...  ,Sen)). 

Hence,  for  some  I,  vi  ^  lT){sei).  We  shall  now  argue  that 

,  Vn)  €  Xvifiau  •  •  • » 0/-1 1  Vni,  a/+i, . . . , a„)  n  '^)  (7.35) 

Clearly  /(wi, . . . ,  %)  G  2i>(5')  amd  so  it  suffices  to  prove  that  /(wi, . . . ,  Vn)  € 
Xvifiai a/-i ,  VW, , a/+i , . . . , On)).  Since  vj  G  Iv(ai)  and  vj  G  Sef,  it  fol¬ 
lows  from  (7.34)  that  wj  G  Tv(Vni)-  Moreover,  Vi  G  Ii>(a;),  i  =  l..n.  Hence 
/(vi,.,.,Vn)  €  Xp(/(ai,, . . . . .  ,a„))  and  this  completes  the 
proof  of  (7.35). 
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Now,  S'  is  smaller  than  S  and  this  violates  the  assumption  that  the  con¬ 
straint  X  2  /(oi , . . . ,  On)  n  5  minimizes  the  cardinality  of  the  set  S.  Hence, 
the  assumption  that  S  contains  an  element  of  the  form  /(sei, . . .  ,sen)  must 
not  be  valid.  This  implies  that  an  application  of  Transformation  10  to 
the  constraint  X  2  /(ai,...,an)  n  5  must  in  fact  produce  the  constraint 
X  2  /(oi)  •  •  •  >  On)-  Moreover,  C  must  already  contain  this  constraint.  Hence 
T>  contains  X  2  /(oi»  •  •  •  )On)j  a-nd  it  follows  that  v  €  Ix)(/l’),  and  this  com¬ 
pletes  the  proof  of  (a). 

To  prove  (b),  suppose  that  X  D  se  is  introduced  by  an  application  of 
Transformation  11  or  10.  By  inspection  of  Transformations  11  and  10,  X 
must  be  an  intersection  variable.  The  first  part  of  the  proof  shall  establish 
that  if  A'  2  se'  appears  in  C,  then 

V  6  Ii}(se')  implies  v  6  Ip  (sc)  (7.36) 

The  proof  proceeds  by  induction.  Suppose  that  (7.36)  holds  for  all  i'  <  i 
and  let  X  2  se'  be  a  constraint  in  C,-  and  let  v  be  a  value  in  Ip(se'). 
Now,  either  X  D  se'  appears  for  the  first  time  in  C,-,  or  else  Ci-i  contains  a 
constraint  of  the  form  X  D  se'.  In  the  first  case,  (7.36)  is  vacuously  true. 
Now  consider  the  second  case.  Clearly  the  only  transformations  that  could 
add  the  constraint  X  2  se'  are  6,  9,  10  and  11.  We  consider  each  of  these 
in  turn. 

If  Transformation  6  is  used  to  obtain  X  2  sef,  then  C,-_i  must  contain 
constraints  2  <*1  (^  *  *  *  Oj-i  H  n  Oj+i  n  •  ♦  •  0  Cn  and  3^  2  <*  such  that 
n  >  2,  a  is  an  atomic  set  expression  that  is  not  a  set  variable,  and  se'  is 
cj  n  •  •  •  n  a,_i  nan  a,+i  n  •  •  •  n  a„.  Clearly  3^  2  <*  appears  in  V.  Now, 
suppose  that  v  €  Ip(sc').  Thus  v  €  T{aj),  j  i,  and  v  £  I(o).  Since  3^  2  ® 
appears  mV,v£  Xt>{y),  and  so 

V  €  ip(ci  n  •  •  •  n  a,'—!  n  3^  n  a,'.|.x  n  •  •  •  n  o^) 

Since  this  constraint  appears  in  C,'-i  and  Ci-i  satisfies  (7.36),  it  follows 
V  e  Iv(se')  and  this  proves  that  (7.36)  holds  for  t. 

If  Transformation  9  is  used  to  obtain  X  D  se',  then  Ci-i  must  contain  a 
constraint  X  D  aifl  -  ■  -nam,  m  >  2,  such  that  se'  is  /V  2  H  •  •  •  n n S 
where  5i , . . . ,  5„  and  , . . . ,  are  subsequences  of  ai , . . . ,  a^  such  that 
the  first  subsequence  contains  the  complement  constants  in  ai, . . .  ,a,„,  and 
the  second  contains  the  remaining  atomic  set  expressions,  and  S  =  5i  U  •  *  -  U 
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Sn-  Now,  suppose  that  v  €  2p(se').  It  is  easy  to  verify  that  this  implies 

V  €  Ip(ai),  i  =  1-T»,  and  so  u  €  Iviai  n  D  a^).  Since  this  constraint 

appears  in  Cj-i  and  C,_i  satisfies  (7.36),  it  follows  v  €  and  this 

proves  that  (7.36)  holds  for  t. 

If  Transformation  10  is  used  to  obtain  X  D  sc\  then  C,_i  must  con¬ 
tain  a  constraint  X  D  oi  fl  •  •  •  n  H  S  such  that  m  >  2,  each  a,-  is 
of  the  form  /(ai,i , •  •  • , Oi,n)t  and  se'  is  X  D  0  S  where 

■Nj  =  3  =  I-”-  Now,  suppose  that  /(vi,...,t;„)  ^lD(se'). 

This  implies  that  vj  €  Tx)(Vnj),  j  =  l..n,  and  /(ui, . . . ,  v„)  G  Ir>{S).  Since 
each  Vj  has  fewer  symbols  than  u,  property  (7.34)  can  be  applied  to  show 
that  Vj  G  2i)(ai,jn-  •  -namj).  It  follows  that  Vj  G  Ti>(at j)>  j  =  l-n,  *  =  l-rn, 
and  so  u  G  TD(/(ai,i, . .  •  ,a»,n))>  *  =  l..m.  Hence  v  G  ^©(ai  D  •  •  •  fl  H  5). 
Since  this  constraint  appears  in  Ci-\  and  C,_i  satisfies  (7.36),  it  follows 

V  G  2i)(s€')  and  this  proves  that  (7.36)  holds  for  t. 

Finally,  if  Transformation  11  is  used  to  obtain  X  D  se',  then  Ci_i  must 
contain  a  constraint  X  D  /(ai,...,a„)  n  5  such  that  /(T,...,T)  ^  S  and 
either  (i)  se*  is  A"  3  /(ai,...,a„)  and  /'(•••)  €  S Jmplies  /  f,  or  else 
(ii)  se'  is  A*  3  /(ai,...,aA:-i,VA)’4,a*+i,...,an)  n  5',  for  some  fc,  1  <  fc  < 
n,  such  that  /(sci,...,5Cn)  G  5,  5'  is  5  -  {/(sci,...,scn)}  and  Nk  is 
M{ak)  U  {sefe}.  Now,  suppose  that  /(vi,...,v„)  G  Iv{se').  In  case  (i), 
it  is  immediate  that  /(wi,...,v„)  G  Ip(/(ai,...,On)).  It  is  also  easy  to 
wrify  that  /(ui,...,v„)  G  Ip(5),  and  so  /(vj,...,w„)  G  Ip(/(oi,...,c„)n 
5)._^  case  (ii),  Vj  G  Tv(ajh  3  ^  k,  Vk  £  TvO^Nk)  ®^od  /(vi,...,v„)  G 
Ix)(5').  Applying  (7.34)  proves  that  vj  G  Xv{aj  n  5^).  Hence  Vj  G  Ii>(aj). 
Also,  Vj  G  I'o^'sej),  and  it  is  easy  to  verify  that  this  implies  ^vi, . . . ,  v„)  G 
Ip(/(sci, . . .  ,scn)).  Combining  this  with  /(vi, . . . ,  v„)  G  T-p^S')  proves  that 
/(t7i, . . . ,  Un)  G  Xv(S)-  Thus,  in  either  case,  /(vi, . . . ,  v„)  G  Iv{se').  Since 
this  constraint  appears  in  C,_i  and  C,_i  satisfies  (7.36),  it  follows  v  G  Jp(se^) 
and  this  proves  that  (7.36)  holds  for  i. 

This  completes  the  proof  of  (7.36),  and  (b)  can  now  be  proved  as  follows. 
If  u  G  Tv{X)  then  there  exists  a  constraint  A"  3  a  in  P  such  that  v  G  Ip(a). 
Since  A’  3  a  also  appears  in  C,  it  follows  from  (7.36)  that  v  G  Tp{a\  0  •  •  •  n 

Am). 

Case  (iii):  Consider  a  constraint  of  the  form  X  3  {X  :  conj).  Suppose 
that  V  G  Ipi{X  :  conj}).  This  implies  that  there  is  an  environment  p  such 
that  p  G  Ip{conj)  and  p{X)  =  v.  Now,  consider  the  following  cases  of  conj: 
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Case  (iii)(a):  Suppose  that  conj  is  in  compaction  fonn.  Since  p  €  Jp({X  : 
conj}),  it  follows  that  Ixt{f!\conj)  is  non-empty  for  each  Y  appearing  in 
conj,  and  hence  each  lm(explicit(C))((^conj)  is  also  non-empty.  By  Propo¬ 
sition  43,  :  conj})  =  T-oi^conj).  It  follows  that  v  6  conj)- 

Moreover,  since  each  lm{explicit(C))(f\ conj)  is  non-empty,  the  precondi¬ 
tions  of  Transformation  14  are  satisfied,  and  so  C  must  contain  the  con¬ 
straint  /V  D  conj.  This  constraint  falls  either  into  case  (i)  or  (ii)  consid¬ 
ered  above.  Since  these  cases  have  already  been  established  for  v,  it  follows 
that  V  €  Iz)(A'). 

Case  (iii)(b):  Suppose  that  conj  does  not  contain  any  quantified  conditions 
of  the  form  s  f  a.  Since  conj  is  in  reduced  form,  each  condition  in  conj  is 
either  of  the  form  X  e  a  or  s  €  X  where  A*  is  a  program  variable,  s  is  a 
program  term  consisting  of  program  variables  and  function  symbols,  a  is  an 
atomic  set  expression  and  A'  is  a  set  variable.  Now,  let  si  €  oi , . . . ,  Sn  G  a„ 
be  a  listing  of  the  quantified  conditions  in  conj  that  do  not  contain  set 
variables  (such  conditions  must  be  of  the  form  s  £  a  where  a  is  ground). 
Define  V(conj)  to  be  the  number  of  function  symbols  appearing  in  sj, . . .  ,s„. 
The  proof  for  this  case  shall  be  argued  by  induction  on  V.  In  the  base  case 
where  V(conj)  is  0,  each  quantified  condition  in  conj  must  have  the  form 
X  £  a  where  JT  is  a  program  variable.  Hence  conj  is  in  compaction  form, 
and  so  the  proof  for  the  base  case  follows  from  case  (iii)(a). 

For  the  induction  case,  suppose  that  V(oon7)  <  j  implies  that  v  £ 
I-p(A'),  and  consider  conj  such  that  V(conj)  =  j  +  1.  Since  V(conj)  >  0, 
it  must  be  the  case  that  conj  is  of  the  form  conj'  A  s  £  a  such  that  s 
contains  some  function  symbols.  This  implies  that  a  must  be  a  variable,  say 
3^.  Moreover,  p  is  such  that  p{s)  £  X-p(y).  Hence,  there  exists  a  constraint 
y  D  a'  in  C  such  that  a'  is  a  non-variable  atomic  expression  and  p(s)  £ 
Xp{a').  Hence  Transformation  5  is  applicable  and  it  follows  that  C  must 
contain  reduce(A’  D  {X  :  conj'  As  £  o'}).  Clearly  v  £  Xv{{X  :  conj'  As  £ 
o'}).  Moreover,  Lemma  14  proves  that  there  exists  a  constraint  X  D  se'  in 
reduce(A'  2  {X  :  conj'  A  s  £  o'})  such  that  t;  €  Xv{se').  By  inspection 
of  the  steps  that  make  up  reduce,  it  is  dear  that  sc'  is  in  fact  of  the  form 
{.Y  :  conj"}.  It  remains  to  show  that  V{conj")  <  V{conj). 

Now,  it  is  easy  to  verify  that  each  step  of  reduce  does  not  increase 
V.  In  fact,  except  for  step  (iv),  all  steps  replace  a  quantified  set  expression 
by  a  (possibly  empty)  collection  of  quantified  set  expressions  such  that  each 
new  quantified  set  expression  contains  fewer  function  symbols  in  its  program 
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terms  than  the  original  quantified  set  expression.  In  other  words,  for  these 
steps,  the  new  quantified  set  expression  have  a  strictly  smaller  V  than  the 
original  quantified  set  expression.  Step  (iv),  on  the  other  hand,  may  intro¬ 
duce  function  symbols  (by  duplicating  a  program  term),  but  the  quantified 
conditions  involved  are  of  the  form  s"  €  a"  where  a"  is  ground,  and  hence 
they  do  not  contribute  to  V,  and  so  step  (iv)  leaves  V  unchanged.  Now, 
consider  the  quantified  conditions  s  €  a'  constructed  by  'transformation  5. 
Clearly  at  least  one  step  of  reduce  is  applicable  to  s  G  a  (since  s  is  not  a 
program  variable,  and  a  is  not  a  set  variable).  Now,  consider  two  cases.  If 
a'  is  ground,  then  V{conj'  A  s  €  c')  <  V^conj'  A  s  G  c),  and  since  reduce 
does  not  increase  V,  it  follows  that  V{conj")  <  V{conj).  If  a!  is  not  ground, 
then  one  of  the  steps  other  than  (iv)  is  applicable,  and  again  it  follows  that 
V(cory")  <  V(conj).  This  completes  the  inductive  proof  of  case  (iii)(b). 

Case  (iii)(c):  Suppose  that  conj  contains  an  apartness  condition  sfa.  The 
proof  for  this  case  is  by  induction  on  the  number  of  apartness  conditions  in 
conj.  If  conj  does  not  contmn  any  such  conditions,  then  it  falls  into  case 
(ili)(b),  and  this  proves  the  base  case.  For  the  induction  case,  suppose  that 
conj  has  less  than  fc  >  1  apartness  conditions  then  v  G  Zv{X),  and  consider 
the  case  where  conj  has  exactly  k  apartness  conditions.  Clearly  conj  has  at 
least  one  apartness  condition,  say  s  f  a.  Since  p  G  T’o{conj),  it  follows  that 
p  G  Ti)(s  t  a).  Hence  there  is  some  value  v'  in  Tp(a)  such  that  p{s)  ^  v'. 
Since  Jp  C  lm{explicit(C)),  this  implies  that  either  Transformation  12  or  13 
is  applicable,  depending  on  whether  t/  is  the  only  value  in  lm{explicit{C)){a). 
In  either  case.  Proposition  41  or  Proposition  42  can  be  applied  to  show  that 
there  is  a  constraint  ^  3  {X  :  conj'}  in  C  such  that  v  G  Ip({A'  :  conj'}) 
and  conj'  contains  exactly  one  fewer  apartness  condition  than  conj.  Hence 

V  G  TviX)  follows  from  the  induction  hypothesis. 

Case  (iv):  Consider  a  constraint  of  the  form  X  2  Suppose  that 

V  G  ^D(/(7)^(a))-  Clearly  IVansformation  8  is  applicable,  and  so  C  contains 

the  constraint  X  D  {JT,-  :  f{Xi,...,Xn)  G  a}.  By  Proposition  37,  Ip({A’  : 
conj})  =  It  follows  that  v  G  Tvi{X  :  conj}).  Now,  since 

X  2  {Xi  :  f{X\,...,Xn)  €  o}  appears  in  C,  case  (iv)  can  be  applied  to 
prove  that  v  G  Tt>{X),  and  this  completes  the  proof  for  this  case.  [] 

Combining  the  above  lemmas  with  Theorem  7  proves  that: 
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Theorem  9  (Correctness  of  Quantified  Set  Expression  Algorithm) 
Let  SCp  be  the  set  constraints  corresponding  to  a  program  P,  and  let  Cq 
=  reduce(standardize(5Cp)).  When  input  toith  Co,  the  instance  of  the 
generic  algorithm  defined  by  A2  terminates  and  outputs  explicit  form  con¬ 
straints  Cout  such  that  lm(Cout)  =var(sCp)  lm(SCp). 

We  note  that  many  decisions  about  the  design  of  the  algorithm  were 
made  to  simplify  presentation  at  the  expense  of  efficiency.  We  now  outline 
a  number  of  these.  First,  consider  the  transformation  involving  substitution 
into  quantified  set  expressions  (Transformation  5).  If  C  contains 

A'  D  {X:X  €yAf(X)€2} 

y  2  m 
y  2  /(w) 

then  Transformation  5  adds  the  constraints 

X  2  {X:Xe  m  A  f(X)  €  zy,  .  . 

X2{X:XefiW)Af(X)eZ}. 

These  constraints  are  unnecessary  because  the  compaction  transformation 
only  requires  that  the  left  hand  side  of  a  quantified  condition  be  a  variable; 
the  right  hand  side  does  not  have  to  be  a  non-variable  expression.  Moreover, 
these  extra  constraints  can  lead  to  further  redundant  constraints,  and  may 
introduce  unnecessary  intersection  variables.  To  illustrate  this,  suppose  that 
the  only  lower  bound  for  2  is  Z  D  fiy)'  Substituting  this  constraint  into 
the  constraints  (7.37)  eventually  leads  to  the  constraint  X  2  /(W)  n  /(>V), 
and  this  may  introduce  a  new  intersection  variable.  Note  that  substituting 
Z  2  f{y)  into  X  2  {X  :X  ^y  A  f{X)  €  2}  leads  to  X  2  {X  :  X  € 
3^},  which  via  compaction  yields  X  2  y^  nnd  by  further  substitution  to 
X  2  nnd  X  2  /(^)*  avoid  such  redundamt  substitution  steps. 
Transformation  5  can  be  modified  so  that  s  is  required  to  be  a  non- variable 
program  term. 

Second,  consider  the  redundancies  inherent  in  the  original  constraints 
SCp.  In  particular,  SCp  contains  many  groups  of  constraints  of  the  form 

X^  2  {Xi:conj},...,X”  2  {Xn :  conj)  (7.38) 

Now,  during  execution  of  the  algorithm,  the  occurrences  of  conj  are  treated 
in  an  identical  manner  in  the  sense  that  if  a  transformation  is  applied  to 
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one  occurrence,  then  it  can  be  applied  to  another.  Hence,  the  work  of 
"solving”  conj  is  duplicated  for  each  occurrence  in  the  initial  constraints.  It 
is  therefore  appropriate  to  consider  grouping  these  constraints  together  into 
a  form  such  as 

whose  meaning  is  identical  to  the  constrmnts  (7.38).  The  only  main  change 
required  for  this  new  kind  of  constraint  involves  the  compaction  transfor¬ 
mation.  Specifically,  this  transformation  becomes: 

If  C  contains  (A'^,...,A^)  D  {(Xi,...,X„)  :  conJ}  such  that 
conj  is  in  compaction  form  and  lm(explicit(C))(f!l  conj)  is  non¬ 
empty,  for  each  X  €  {A'^,...,A'“},  then  output  the  constraints 
X  D  conj,  for  each  X  €  {X^,. . . , .V"}. 

We  also  observe  that  the  bounds  described  in  the  termination  argument 
(see  Lemma  18)  can  be  significantly  tightened.  By  doing  so,  it  is  fairly 
straightforward  to  obtain  an  EXPTIME  bound  on  the  execution  of  the  algo¬ 
rithm.  We  note  that  exptime  bounds  have  been  reported  by  Pruhwirth, 
Shapiro,  Vardi  and  Yardeni  [18]  for  a  related  class  of  set  constraints. 
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We  conclude  this  chapter  with  a  discussion  of  the  literature  related  to  set 
constraint  algorithms.  Early  work  by  Reynolds  [63]  describes  a  simplifica¬ 
tion  algorithm  for  set  constraints  involving  projection.  The  motivation  for 
this  work  was  the  inference  of  data  type  definitions  in  a  first  order  functional 
langus^e.  Subsequently,  a  similar  algorithm  was  independently  developed 
by  Jones  and  Muchnick  [32].  In  essence,  these  algorithms  consist  of  Trans¬ 
formations  1, 2  and  3.  Further  work  by  [30, 31, 50, 56]  has  extended  the  basic 
approach  to  higher  order  functions,  binding  time  analysis  and  analysis  for 
compile-time  garbage  collection  and  globalization  of  function  parameters. 
Again,  projection  is  the  only  operator  employed. 

Constraints  involving  notions  of  both  intersection  and  projection  were 
first  used  by  Mishra  [48]  to  approximate  the  success  set  of  a  logic  program. 
Specifically,  the  constraints  contained  intersection  and  a  form  of  projection. 
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However,  for  algorithmic  reasons,  an  approximate  form  of  union  was  used 
(see  the  discussion  of  hi  in  Section  5.6,  page  134).  In  essence,  this  restriction 
ensured  that  the  set  of  atoms  and  terms  that  could  be  defined  were  tuple- 
distributive  in  the  sense  that  they  were  closed  under  the  ★  operator  defined 
on  page  134.  Moreover,  only  partial  algorithms  were  given. 

The  first  decidability  results  for  set  constraints  were  obtained  by  Heintze 
and  Jaffar  in  [21,  22].  In  [21],  an  algorithm  was  presented  for  solving  con¬ 
straints  involving  quantified  expressions  of  the  form  {X  :  s  €  se}  where  s  is 
a  program  term  constructed  from  program  variables  and  function  symbols. 
The  purpose  of  this  was  to  obtain  a  simple  and  decidable  approximation  to 
the  success  set  of  a  lo^c  program  (the  approximation  defined  is  equivalent 
to  bottom-up  sbop).  In  [22]  the  set  constraint  calculus  was  formalized  and 
studied  them  in  an  abstract  setting  (most  of  our  definitions  and  notation  for 
set  constraints  are  taken  from  this  paper).  Its  main  result  of  was  a  decision 
procedure  to  determine  the  satisfiability  of  definite  set  constraints,  which 
are  constraints  of  the  form  aD  se  where  a  is  a  set  expression  that  contains 
no  set  operators  and  se  is  a  set  expression  whose  set  operators  are  projec¬ 
tions  and  intersections.  Collections  of  definite  constraint  have  least  models 
whenever  they  are  satisfiable. 

Soon  after  writing  [21,  22],  we  discovered  an  alternative  proof  of  the 
results  therein  using  a  reduction  to  a  result  by  File  [16].  To  motivate  this 
reduction,  first  note  that  for  some  programs  P,  bottom-up  set  based  analysis 
is  exact  in  the  sense  that,  using  bottom-up  semantics,  sbop  =  lm{SCp).  A 
syntactic  characterization  of  a  class  of  programs  with  this  property  is  given  in 
[21];  call  this  class  EXACT(s6a).  Now,  the  main  result  of  [21]  essentially  shows 
that  bottom-up  sbop  is  decidable  and  is  a  regular  set.  As  a  corollary,  the 
success  set  of  all  programs  in  EXACT(sba)  is  decidable  and  regular.  Moreover, 
using  the  transformations  described  in  [25],  an  arbitrary  program  P  can  be 
transformed  to  a  program  P'  in  EXACT(s&a)  such  that  shop  =  shcp'  (for 
bottom-up  semantics).  This  means  that  the  problem  of  computing  bottom- 
up  sbop  is  equivalent  to  the  problem  of  computing  lm{€Cp)  for  programs  in 
EXACT(s&a).  Now,  in  [16],  File  defines  a  subclass  of  logic  programs  based 
on  an  extended  notion  of  tree  automata  called  pattern  replacing  automata, 
and  shows  that  the  success  set  of  all  programs  in  this  class  is  decidable  and 
regular.  The  key  step  for  using  this  result  to  prove  the  decidability  and 
regularity  of  bottom-up  shap  is  a  transformation  that  maps  any  program  in 
EXACT(s5a)  into  an  equivalent  program  in  File’s  class. 
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We  now  review  some  of  the  main  works  subsequent  to  ours.  In  [18], 
Fruhwirth,  Shapiro,  Vardi  and  Yardeni  provided  another  proof  of  the  de¬ 
cidability  of  sbap.  Their  proof  uses  a  technique  very  similar  to  the  above 
reduction  to  File’s  result,  although  they  were  unaware  of  his  result  and  es¬ 
sentially  gave  an  alternative  proof  of  it.  In  [5]  Aiken  and  Murphy  presented 
an  algorithm  for  set  constraints  involving  intersection  and  complement  but 
omitting  projection. 

Very  recently,  Bachmair,  Gandzinger  and  Waldmann  [8]  have  obtained 
an  elegant  proof  of  the  decidability  of  the  satisfaction  problem  for  a  large 
subclass  of  set  constraints  involving  complementation,  intersection  and  pro¬ 
jection  (in  particular,  the  class  they  consider  properly  contains  the  definite 
constraints  considered  in  [22]  and  the  constraints  considered  in  [5]).  The 
basis  of  their  result  is  a  translation  from  set  constraints  into  predicate  calcu¬ 
lus  formulas  constructed  from  monadic  predicates,  variables  and  quantifiers 
(note  that  there  are  no  function  symbols).  We  briefly  outline  the  approach. 

It  has  long  been  recognized  that  there  are  dose  connections  between  set 
constrsdnts  and  various  fragments  of  lo^c  (for  example,  see  [22],  where  it 
is  observed  that  results  by  Rabin  [62]  prove  the  deddabflity  of  monadic  set 
constraints).  Such  relationships  exists  because  set  based  reasoning  can  be 
expressed  in  the  predicate  calculus  by  regarding  a  monadic  predicate  as  the 
set  of  values  on  which  it  is  true.  Hence,  a  set  constraint  X  D  f{X)  U  a  can 
be  translated  into  the  formula  Pxio)A{Pxix)  =>  Pxif{x)),  where  Px  is  the 
predicate  introduced  to  capture  the  set  variable  X. 

The  key  idea  of  [8]  is  the  use  of  skolemization  to  establish  a  correspon¬ 
dence  between  set  constraints  and  a  dass  of  formulas  that  was  shown  dedd- 
able  by  Lowenheim  [42]  (see  [2]  for  a  somewhat  simpler  proof).  In  essence, 
set  constraints  are  equivalent  to  predicate  calculus  formulas  that  are  the 
result  of  skolemizing  a  formula  with  monadic  predicates  that  is  in  prenex 
normal  form.  For  example,  consider  the  set  constraint  X  D  f{X)  where 
^  This  can  be  written  in  the  predicate  calculus  as 

Px{x)^Pj^X){x)  A  P}(x)U{^))^Px{x)  A  Pf^x){a)  false 

where  the  predicates  Px  and  P/^x)  capture  the  values  of  X  and  f{X)  re¬ 
spectively.  Now,  this  formula  is  just  the  result  of  skolemizing^ 

’Note  that  this  fonnuU  contain!  only  one  occurrence  of  the  variable  /,  and  this  occurrence 
spears  in  the  expression  P/(  ;r)  (/)  >  where  Py(  x)  *  predicate  symbol  and  /  is  the  bound  variable 
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3a'ix3f(pxix)=> Pf(^x)ix)  ^  Pj(X)if)<^ Pxix)  A  Py(^)(a)<»/aise) 

which  is  in  prenex  normal  form  and  consists  of  only  variables  and  monadic 
predicates  symbols.  The  specific  subclass  of  set  constraints  considered  by  [8] 
can  be  characterized  as  constraints  of  the  form  sei  D  se2  such  that  sei  does 
not  contain  the  projection  symbol.  The  general  set  constraint  problem  posed 
in  [22]  (that  is,  arbitrary  constraints  involving  complementation,  intersection 
and  projection)  is  still  open. 

We  now  provide  an  algorithmic  comparison  between  this  chapter  and 
other  works  in  the  literature.  The  set  constraint  algorithm  presented  in 
this  chapter  is  based  on  algorithms  developed  by  Heintze  and  Jaifar  [21, 
22,  23).  The  main  difference  is  that  we  consider  more  general  quantified 
set  expressions.  In  particular,  [21]  was  restricted  to  quantified  expressions 
that  are  of  the  form  {JT  :  conj}  such  that  each  quantified  condition  in 
conj  is  of  the  form  s  €  se  where  s  is  constructed  from  function  symbols 
and  program  variables,  and  se  is  a  set  expression.  [23]  essentially  considered 
quantified  conditions  of  the  the  form  X  €  X  and  X  t  A'  where  X  is  a  program 
variable  and  X  is  a  set  variable.  In  contrast,  the  algorithm  presented  in  this 
chapter  deals  with  quantified  conditions  of  the  form  s  €  se  and  s  f  se, 
where  s  may  contain  function  symbols,  projections  and  program  variables, 
and  se  is  a  set  expression^.  It  also  deals  with  complement  constants.  The 
extensions  to  quantified  expressions  are  necessary  because  the  imperative 
language  considered  in  this  thesis  is  much  more  general  than  that  used  in 
[23].  We  note  that  it  is  the  appearance  of  projections  in  quantified  conditions 
that  necessitates  the  use  of  the  safeness  invariant. 

There  are  two  main  alternatives  to  the  approach  we  have  adopted  for 
computing  set  program  approximations.  The  first  is  based  on  the  trans¬ 
formation  of  set  constraints  into  the  class  considered  by  File  [16],  and  the 
second  is  based  on  the  transformation  of  set  constraints  into  the  monadic 
predicate  formulas  considered  by  Bachmair,  Gandzinger  and  Waldmann  [8]. 
While  these  approaches  involve  simpler  correctness  proofs,  the  approach  of 
[22],  which  we  have  adapted,  has  a  number  of  important  advantages.  Specif¬ 
ically,  it  is  more  direct  than  the  other  methods  and  remains  entirely  within 
the  framework  of  set  constraints.  Moreover,  the  algorithms  involved  are 

in  quection. 

^Strictly  (peaking,  for  decidability  reasons,  the  program  term  s  in  s  €  se  cannot  contain 
completely  arbitrary  combinations  of  function  symbols,  projections  and  variables.  See  page  197 
for  further  details. 
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simpler  and  more  intuitive  (although  the  proofs  are  not).  Moreover,  it  pro¬ 
vides  greater  flexibility,  yields  an  explicit  representation  of  the  least  model 
the  constraints,  and  appears  to  be  more  amenable  to  implementation.  We 
now  expand  on  these  last  three  points. 

The  algorithm  in  [22]  is  very  flexible  in  the  sense  that  it  can  be  extended 
in  a  number  of  ways  to  deal  with  a  variety  of  set  operators  arising  from 
the  analysis  of  different  programming  langu^es  (as  has  been  exploited  in 
this  thesis).  In  contrast,  the  other  approaches  typically  involve  translations 
into  (decidable  subclasses  of)  other  formal  systems,  and  intuitive  extensions 
to  the  set  constraints  do  not  usually  map  into  intuitive  extensions  in  these 
formal  systems.  This  is  either  because  the  transformations  themselves  do 
not  make  sense  on  the  extensions,  or  else  the  translation  of  the  extended 
constraints  ^ves  rise  to  formulas  in  the  formal  system  that  does  not  satisfy 
the  relevant  syntactic  criteria  required  for  decidability.  For  example,  there 
does  not  appear  to  be  any  way  to  extend  [8]  so  that  the  reverse  skolem- 
ization  transformation  can  be  applied  to  constraints  that  contain  quantified 
set  expressions.  Similarly,  although  [16]  can  be  used  to  solve  set  constraints 
involving  a  restricted  form  of  quantised  set  expressions,  this  method  cannot 
be  extended  to  solve  quantifled  set  expressions  Involving  apartness  condi¬ 
tions  or  quantified  conditions  of  the  form  s  €  se  where  s  contains  projection 
symbols. 

An  explidt  representation  of  the  least  model  of  the  constraints  is  par¬ 
ticularly  important  for  program  analysis  applications.  This  representation 
provides  the  characterization  of  the  structure  of  possible  run-time  values  that 
is  needed  for  many  compile-time  code  improvements.  Such  an  explidt  rep¬ 
resentation  is  computed  by  the  algorithm  in  [22]  and  the  algorithm  obtained 
using  [16].  In  contrast,  although  the  algorithm  of  [8]  provides  a  method  of 
answering  questions  about  the  least  model  (induding  membership  and  non¬ 
emptiness),  it  does  not  provide  any  notion  of  explidt  representation  of  the 
least  model. 

The  set  constraints  algorithms  based  on  [8]  and  [16]  are  complex  and 
highly  combinatorial  in  nature  and  do  not  appear  to  provide  a  basis  for 
implementation.  In  contrast,  the  algorithm  in  [22]  is  very  simple  and  appears 
to  be  better  suited  to  implementation.  In  particular,  it  can  be  reformulated 
in  such  a  way  that  simple  operations  such  as  projection  can  be  treated 
spedally  and  implemented  cheaply.  More  generally,  because  this  algorithm 
is  very  direct,  it  is  easy  to  take  advantage  of  the  structural  properties  of  set 
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constraints  that  arise  from  typical  programs.  This  appears  to  be  crucial  for 
practical  implementation  of  set  based  analysis.  On  the  other  hand,  because 
the  approaches  based  on  [8]  and  [16]  use  involved  transformations  into  other 
formal  systems,  such  properties  are  more  difficult  to  exploit. 


Chapter  8 

Implementation 


The  algorithm  described  in  Chapter  7  focussed  on  the  issue  of  decidability 
of  set  based  analysis.  In  particnlar,  numerous  aspects  of  the  algorithm  were 
designed  for  clarity  rather  than  efficiency.  As  a  result,  a  straightforward  im¬ 
plementation  of  this  algorithm  gives  very  poor  performance.  This  chapter 
describes  the  design  and  implementation  of  a  prototype  system  for  practi¬ 
cal  set  based  analysis.  In  particular,  we  show  that  substantial  progress  can 
be  made  by  redesigning  the  algorithms,  employing  appropriate  representa¬ 
tion  techniques,  and  removing  various  forms  of  redundancy.  We  provide 
empirical  evidence  to  show  that  practical  set  based  analysis  is  within  reach. 
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8.1  Introduction 


This  chapter  is  a  progress  report  on  an  ongoing  effort  to  incorporate  set 
based  analysis  in  an  experimental  compiler,  and  focuses  on  one  of  the  main 
uncertainties  of  set  based  analysis:  its  computational  cost.  It  is  clear  that 
solving  set  constraints  can  be  expensive  in  the  worst  case,  and  this  is  due  to 
the  exponential  behavior  of  the  intersection  operation  (see  [18]  for  a  formal 
account  of  the  exponential  behavior  of  one  class  of  set  constraints).  How¬ 
ever  it  is  not  clear  whether  worst  case  behavior  is  a  good  indication  of  the 
practicality  of  set  based  analysis,  since  programs  rarely  exhibit  the  extremes 
of  behavior  used  in  worst  case  analysis.  For  example,  in  the  worst  case,  the 
arity  of  predicate  and  function  symbols  may  increase  linearly  with  the  size 
of  a  program,  but  this  is  rarely  the  case  in  practice.  Moreover,  many  pro¬ 
grams  have  a  very  hierarchical  structure  and  mutual  recursion  rarely  extends 
beyond  a  small  number  of  predicates. 

We  address  the  question  of  practicality  by  developing  and  evaluating  an 
implementation  of  the  set  constraint  simplification  algorithm.  For  simplic¬ 
ity,  we  shall  restrict  our  attention  to  intersection-projection  constraints  (see 
Section  7.5),  and  for  convenience  we  shall  write  these  constraints  as  equal¬ 
ities.  Specifically,  the  constraints  considered  in  the  implementation  are  of 
the  form  Ai  =  scj , . . . ,  A’n  =  sc„  where  Ai, . . . ,  A’„  are  distinct  set  variables, 
and  each  se,-  is  constructed  from  union,  function  symbols,  set  variables  and 
intersection  and  projection  operators. 

Corresponding  to  a  program  P,  constraints  of  this  form  can  be  con¬ 
structed  to  analyze  P,  in  a  manner  similar  to  the  construction  of  SCp.  The 
differences  between  these  constraints  and  SCp  are  relatively  noinor,  although 
in  general  the  constraints  SCp  are  slightly  more  accurate.  Most  importantly, 
the  constraints  used  here  provide  a  similar  uniform  treatment  of  structures, 
and  represent  the  core  part  of  the  more  complex  constraints.  Also,  the 
proofs  of  correctness  of  the  constraints  can  be  adapted  from  the  correctness 
of  SCp.  Since  the  focus  of  this  chapter  is  on  solving  set  constraints,  and  not 
on  the  specific  construction  of  constraints  from  a  program,  we  shall  omit  the 
full  details  of  this  construction,  and  instead  give  some  examples.  Through¬ 
out  this  chapter  we  shall  focus  on  logic  programs  because  they  tend  to  yield 
constraints  that  are  more  difficult  to  solve. 

The  method  for  constructing  constraints  from  a  program  is  essentially 
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piX)^qiX),riX). 

q(a). 

qif(Y))  -  q(Y). 
r{m). 


Retp  =  p{X) 

Retg  =  q{a)  U  qif(y)) 

Retr  =  rU{Z)) 

X  =  q~^{Retg)  n  T~^{R€tr) 

y  = 

Z  =  T 


Figure  8.1:  Bottom-Up  Set  Constraints 


the  same  as  that  outlined  in  Chapter  2  and  then  formalized  in  Chapter  4. 
First,  set  variables  are  introduced  to  capture  the  sets  of  values  of  each  pro¬ 
gram  variable  at  each  point  in  the  program.  Then  constraints  are  written 
between  these  sets  to  safely  approximate  the  local  consistency  conditions  of 
the  program.  Figure  8.1  illustrates  the  construction  of  constraints  to  ap¬ 
proximate  the  bottom-up  semantics  of  a  logic  program.  The  main  difference 
between  these  constraints  and  SCp  is  that,  for  convenience,  we  have  used 
variables  Retp,  Retg  and  Retr  to  capture  the  sets  of  ground  atoms  in  the 
success  set  corresponding  to  the  predicates  p,  q  and  r  respectively.  As  be¬ 
fore,  set  variables  X,  y  and  Z  are  used  to  capture  the  sets  of  values  for  the 
program  variables  X,  Y  and  Z  respectively.  For  example,  the  constraint 
X  =  q~^(Retg)  n  r~^(Retr)  indicates  that  the  set  of  values  for  the  program 
variable  X  consists  of  those  values  t;  such  that  q(v)  is  in  Retg  and  r(v)  is  in 
Retr. 


Figure  8.2  shows  how  constraints  may  be  constructed  for  the  analysis 
of  a  logic  program  under  a  top-down  left-to-right  execution  strategy.  Re¬ 
calling  the  notation  for  program  points,  note  that  program  point  3  indi¬ 
cates  program  execution  just  before  is  called  in  the  body  of  the  rule 
p(X)^q(X),  r(X),  point  4  indicates  execution  just  before  r(X),  and  point 
5  indicates  execution  after  both  body  atoms  have  succeeded.  As  in  <SCp,  a 
set  variable  is  introduced  to  describe  the  values  of  each  program  variable  at 
each  program  point.  The  set  variables  X^,  X*,  and  X^  respectively  denote 
the  values  of  X  at  points  3,  4  and  5.  The  variables  Callp,  Callg  and  Callr 
have  been  introduced  to  capture  the  possible  calls  to  p,  q  and  r.  For  exam¬ 
ple,  the  constraint  X^  =  p~^{CaUp)Oq~^{Retg)  indicates  that  the  values  of 
X  at  point  3  consists  of  all  v  such  that  p(v)  is  in  Callp  and  q{y)  is  in  Retg. 
Additional  initial  goals  can  be  accommodated  by  appropriately  modifying 
the  equations  for  Callp,  Callg  and  Callr.  Figure  8.3  contains  another  ex- 
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=  T 

=  p-^{Retp) 

^3  =  p-\Callp) 

X*  =  p-HCallj,)  n  q-^iRetg) 

2.  *-p{Wf.  =p-\Callj,)Oq-^{Retq)C\  r-\Retr) 

5.  p{X)  ^  q{Xf,  r{XY.  j’’  =  r-\CaUr) 

6.  g(a).  Retp  =  p{X^) 

7.  r(y).  Retq  =  g(a) 

Retr  =  r(y^) 

Ca//p  =  p(>V^) 

Ca//,  =  qiX^) 

Callr  =  rCA"*) 

Figure  8.2:  Top-Down  Set  Constraints 

2.  *-app{oons{h,  nil),  cons(c,  nil),  V)*. 

3.  appinil,W,W). 

5.  app(cons(X,  L),  Y,  cons(X,  Z))<~app(L,Y,  Z)^. 

=  T 

=  app^^^(Retapp) 

=  app("^j(Ca//app)  n  app^3^j(CcK„pp) 

X^  =  cons^jj(opp^jj(C’oWapp))  n  cons ^^^{app^^^{C all apj^) 

=  con5^2^)(app|‘j^j(Ca«aw»)) 

=  apPl2)(^aiUpp) 

Z*  =  con5^25(®^'Fr3)(^®^^««»)) 

X^  =  cons|'j^)(app^j^)(CcW„pp))  D  consJ^-^{app-^^~^{Calla„)) 

£5  =  cons|-25(opp(-5(Co//„;^))napp(-5(i?eW) 

y®  =  app-^^^{Calla„)r\app-^^^{Reta„) 

Retapp  =  app{nil,yV^,yV^)U  app(cons{X^,C^),y^,cons(X^,Z^)) 
Callapp  =  app{cons{b,nil),cons{c,nil),V')U  app{C*,y*,Z*) 


Figure  8.3:  The  Append  Program  and  Its  Top-Down  Constraints 
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ample  of  the  construction  of  the  top-down  constraints,  this  time  involving 
the  append  program. 


8.2  The  Basic  Set  Constraint  Algorithm 


We  present  a  reformulation  of  the  algorithm  for  intersection-projection  con¬ 
straints  from  Section  7.5.  The  main  difference  from  the  algorithm  in  Sec¬ 
tion  7.5  is  in  this  presentation  we  have  attempted  to  minimize  the  number 
of  transformation  steps  that  have  to  be  performed  by  combining  the  substi¬ 
tution,  projection  simplification  and  intersection  simplification  transforma¬ 
tions. 

The  first  step  of  the  algorithm  is  a  preprocessing  stage  that  puts  the 
constraints  in  a  convenient  form.  This  form  essentially  corresponds  to  a 
restriction  of  standard  form.  Define  that  an  intersector  is  of  the  form  ai  n 
•  •  •  n  o„  where  the  Uj  are  atomic  set  expressions.  A  projector  is  of  the  form 
/(7/(®)  where  a  is  an  atomic  set  expression.  A  constraint  is  in  restricted 
standard  form  if  it  is  of  the  form  A  =  sci  U  •  •  •  U  sen  where  at  most  one 
of  the  scj  is  either  a  projector  or  an  intersector,  and  the  remaining  scj  are 
atomic  expressions.  A  collection  of  constraints  Ai  =  sei, . . . ,  An  =  sem  is 
in  restricted  standard  form  if  Aj, . . . ,  A^  are  distinct  set  variables,  and  each 
constraint  is  in  restricted  standard  form. 

The  method  for  converting  constraints  to  restricted  standard  form  in¬ 
volves  repeatedly  identifying  an  occurrence  of  a  set  expression  that  does  not 
satisfy  the  restricted  standard  form  definition,  replacing  it  by  a  new  variable 
and  then  adding  a  new  equation  between  the  new  variable  and  the  replaced 
set  expression.  The  details  are  very  similar  to  the  conversion  of  constraints 
to  standard  form  in  Section  7.3,  ps^e  174.  Importantly,  given  a  collection 
C  of  constraints  of  the  form  X\  =  sei , . . . ,  A^  =  sem  where  Ai , . . . ,  Am  are 
distinct  variables,  the  resulting  constraints,  call  them  Co,  are  in  restricted 
standard  form,  and  /m(C)  =„or(C) 

The  bulk  of  the  algorithm  consists  of  the  exhaustive  application  of  four 
transformations  to  restricted  standard  form  constraints.  Before  presenting 
these  transformations,  it  is  convenient  to  represent  constraints  using  an  array 
indexed  by  set  variables.  Specifically,  let  C  be  a  collection  of  restricted 
standard  form  constraints,  and  define  rhsc  as  follows 
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,  /  def  I  {dci , . . . ,  sen)  if  C  contains  X  =  scj  U  •  •  •  U  sen 
rksW  =  I  ,j  otherwise 

where  X  ranges  over  the  set  variables  in  C.  It  is  convenient  to  extend  this 
notation  to  atomic  set  expressions,  so  that  rhs(a)  denotes  rhs{X)  if  a  is  the 
set  variable  X,  and  rhs(a)  denotes  {a}  if  a  is  not  a  set  variable.  We  can 
now  describe  the  first  three  transformation  steps. 

Transformation  15  Ify  €  rhs(X)  and  a  €  rhs{y)  is  an  atomic  expression 
then  add  a  to  rhs(X).  Q 

Transformation  16  If  /(7)^(a)  €  rhs(X)  and  rhs{a)  contains  T  then  add 
T  to  rhs(X).  [] 

Transformation  17  If  €  rhs(X)  and  rhs(a)  contains  f(ai,..., an) 

then  add  ai  to  rhs{X)  if  /m(exp/ictt(C))(/(ai,...,a„))  /  {}.  [] 

The  last  transformation  simplifies  intersectors.  Again  we  use  the  V^r 
naming  scheme  for  any  new  variables  introduced  during  the  simplification 
of  intersections. 

Transformation  18  ^oi  n  •  •  •  n  Om  €  rhs{X)  and  there  exists  a  sequence 
a\,...,a'„^  such  that  a\  6  rhs(a,),  i  =  l..m,  and  for  some  /  €  S,  each  cj 
has  the  form  /(•••)  or  T,  then  let  a",...,a'f  be  the  elements  of  a\^...,a'^ 
of  the  form  /(•  •  •)  and 


•  if  k  =  0  then  add  T  to  rhs{X); 

•  */A:  >  0,  let  the  arity  of  f  hen,  let  a'(  be  /(aj,i,. . .  i  =  l..k,  let 

Nj  =  A/'(ai j)  U  •  •  •  U  Af(akj),  j  =  l..n,  and 

-  add /(V/Vi,...,Vn„)  to  rhr{X),  and 

—  for  each  j,  if  rhsfVj^j )  =  {}  then  add  oj  j  n  •  •  •  fl  akj  to  rAs(VV,  )• 
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w  =  -vu/-^(/2(w)),  • 

^  =  f(y)nz, 

^  c 

2:  =  f{a)Ug{a), 

Figure  8.4:  Example  Execution 


Add  /(>V)  to  rhs(yV); 

Add  /(V{y,a})  to  rAs(A’),  and 
set  r^(V{y,a})  to  {3^  n  a); 
Add  /(V(y,a})  to  r/is(W); 

Add  a  to  rhs(y^y^a})i 

'  the  Reformulated  Algorithm 


The  reformulated  algorithm  can  now  be  stated  as:  exhaustively  apply 
those  instances  of  Transformations  15-18  that  change  rhs,  and  on  termina¬ 
tion  output  the  explicit  form  constraints  resulting  from  the  deletion  of  all 
intersectors,  projectors  and  variables  from  rhs.  The  main  difference  between 
this  algorithm  and  the  intersection-projection  algorithm  in  Chapter  7  is  that 
the  op-substitution,  projection  simplification  and  intersection  simplification 
transformations  (Transformations  1,  3  and  4  respectively)  have  been  com¬ 
bined  in  an  effort  to  minimize  the  number  of  transformation  applications 
performed  during  the  algorithm.  The  proofs  of  termination  and  correct¬ 
ness  of  the  reformulated  algorithm  are  similar  to  those  for  the  intersection- 
projection  algorithm.  In  summary, 

Theorem  10  Let  C  be  a  collection  of  constraints  in  restricted  standard 
form.  Then,  the  exhaustive  application  of  Transformations  15-18  termi¬ 
nates.  Moreover,  the  resulting  constraints  C  are  such  that  lm{C)  =„or(C) 
lm{explicit{C')).  [| 

Figure  8.4  contains  an  example  execution  of  the  reformulated  algorithm. 
When  input  with  the  constraints  shown  below  on  the  left  column,  the  re¬ 
formulated  algorithm  performs  steps  A,  B,  C  and  D,  corresponding  to  ap¬ 
plications  of  Transformations  16, 18, 15  and  18  respectively.  The  final  out¬ 
put  of  the  algorithm  consists  of  the  constraints  >V  =  /(>V)  U  /(V^y^aj), 
^  ^  2  =  /(a)  U  jr(c),  and  =  a. 


8.3  Outline  of  Implementation 


The  implementation  of  the  preprocessing  phase  is  fairly  straightforward. 
We  shall  instead  concentrate  on  the  implementation  of  the  core  part  of  the 
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algorithm.  We  first  classify  the  kinds  of  expressions  that  can  appear  in  the 
sets  rhs(X).  First,  there  are  non-variable  atomic  expressions,  and  we  refer  to 
these  as  constructed  expressions  and  denote  them  by  ce.  In  essence,  these  are 
the  central  objects  that  are  manipulated  by  the  algorithm.  Second,  there 
are  the  projectors,  and  these  can  be  viewed  as  operators  on  constructed 
expressions.  For  example,  if  rhs(X)  contains  the  constructed  expression 
7(01,02),  then  /(2)(‘^)  produces  02  (which  must  either  be  a  constructed 
expression  or  a  variable).  Specifically,  corresponding  to  each  projector  of 
the  form  /(7)^(‘  •  •)>  define  a  partial  function  projj^i  as  follows 

Oi  if  ce  is /(oi,...,On) 

T  if  ce  is  T 

undefined  otherwise 

The  third  class  of  expressions  consists  of  the  intersectors.  Again  these 
can  be  viewed  as  operators  on  constructed  expressions,  and  corresponding  to 
each  m-ary  intersection  expression,  we  ddine  a  partial  function  intersectm 
as  follows.  Let  cei,...,cem  be  a  sequence  of  constructed  expressions  such 
that,  for  some  /  €  S,  each  ce,-  has  the  form  /(•  •  •)  or  T,  let  ci , . . . ,  a*  be  the 
elements  of  cei, . . . , cem  of  the  form  /(•  •  •),  let  each  o,-  be  ,  <*»>), 

i  =  l..fc,  let  Nj  is  U  •  •  •  U  A/'(aifej),  j  =  l..n,  and  define  that 

intersectmicei,...,cem)  = 

and  that  intersectmicei^.,.,cem)  is  undefined  if  the  sequence  cei,.. .  ,cem 
is  not  of  the  specified  form.  The  final  kind  of  expressions  that  can  occur  in 
rhs  are  simply  variables,  and  their  main  effect  in  the  algorithm  is  to  directly 
transfer  constructed  expressions  from  one  equation  to  another. 

On  the  basis  of  this  classification,  we  represent  the  array  rhs  as  three 
separate  arrays,  con,  var  and  op.  Let  con(X)  denote  the  set  of  constructed 
expressions  in  rhs{X),  let  var{X)  denote  the  set  of  variables  in  rhs{X) 
and  let  op{X{)  denote  the  set  of  projectors  and  intersectors  in  rhs{X).  For 
example,  the  equation  X  =  f^{X)  U  f(X)  U  f~^{X)  U  y  is  represented  as 
coniX)  =  {PiX),fiX)},  op{X)  =  {r\X)},  and  t;ar(A')  =  {y}.  Note 
that  the  preprocessing  phase  ensures  that  initially  op{Xi)  is  either  empty 
or  a  singleton  set,  and  that  the  algorithm  never  alters  op(Xi).  Hence  we 
shall  treat  op(Xi)  as  either  the  empty  set,  or  a  projector  or  an  intersector. 
Corresponding  to  the  convention  that  rhs(a)  is  just  {a}  if  a  is  not  a  set 


f  T  if  fc  =  0 

\  /(Viv„...,VArJ 


projf,i(ce)  =  < 
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variable,  we  shall  similarly  define  that  con{a)  is  {a}  if  a  is  not  a  set  variable. 
Using  this  new  notation,  Transformations  15-18  can  be  rephrased  as: 

•  If  op(A')  is  ce  €  eon(y)  and  nonempty (t),  then  if  projf^i{ce) 

is  a  variable,  add  it  to  var(A'),  and  otherwise  add  it  to  con(A'). 

•  If  op(A^)  is  fli  n  •••  n  Otn  and  cc,-  €  con(ai),  i  =  l..m,  then  add 
intersectm(cei,...,cem)  to  con{X),  and  construct  appropriate  equa¬ 
tions  for  any  new  intersection  variables  introduced. 

•  If  y  €  var(X)  and  ce  €  con(y),  then  add  ce  to  con(A’). 

where  “add”  is  used  informally  to  mean  that  if  the  operation  is  defined  then 
add  the  resulting  atomic  expression  to  the  specified  set  if  it  does  not  already 
appear  there,  and  nonempty(ce)  denotes  a  function  that  determines  whether 
ce  is  non-empty  in  lm(explicit(C))  (we  shall  address  the  implementation  of 
nonempty  later  in  this  section). 

Our  implementation  is  essentially  based  on  the  above  formulation  of 
transformations.  Two  key  observations  about  this  formulation  motivate  a 
number  of  major  implementation  design  decisions.  First,  a  frequent  oper¬ 
ation  in  the  algorithm  involves  comparison  of  constructed  expressions,  and 
in  particular,  the  determination  of  whether  a  new  constructed  expression  is 
already  an  element  of  a  set.  Second,  at  any  particular  instance,  there  may 
be  many  possible  steps,  but  only  a  few  of  these  are  likely  to  be  productive. 
It  is  therefore  important  to  be  able  to  focus  on  those  steps  that  are  likely  to 
be  productive.  We  now  elaborate. 


Representation  of  Constructed  Expressions 

To  provide  the  cheap  comparison  for  constructed  expressions,  we  code  each 
such  expression  as  an  integer  in  the  following  simple  manner.  As  each  vari¬ 
able  is  encountered,  a  unique  positive  integer  is  allocated  for  it.  Notation- 
ally  we  shall  write  to  denote  the  integer  associated  with  the  variable 
X.  Function  and  constant  symbol  are  also  identified  by  positive  integers; 
we  write  #/  to  denote  the  integer  for  the  function  symbol  /.  The  coding  of 
constructed  expressions  is  essentially  performed  by  incrementally  building 
a  mapping  C  from  sequences  of  integers  into  negative  integers  (C  is  ini¬ 
tially  empty).  Specifically,  the  coding  #ce  of  a  constructed  expression  ce  is 
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achieved  as  follows.  Let  ce  be  /(ai,...,an)  and  first  compute  the  sequence 
Then,  return  »#On)  if  it  is  already  de¬ 

fined,  and  otherwise  allocate  a  new  negative  integer  say  j,  update  C  appro¬ 
priately  and  then  return  j.  An  array  is  also  maintained  to  map  each  coding 
into  the  sequence  it  represents.  This  is  used  particularly  by  the  projection 
operation. 

Coding  constructed  expressions  as  integers  provides  a  very  compact  rep¬ 
resentation  since  there  is  maximal  sharing  between  constructed  expressions 
that  have  common  subexpressions.  It  also  allows  the  set  con(A')  to  be  rep¬ 
resented  efficiently.  Specifically,  we  represent  the  relationship  a  €  con(A') 
using  ordered  pairs  (#a,  #A’),  and  this  in  turn  is  implemented  using  a  hash 
table.  Similar  comments  apply  to  a  number  of  other  operations  of  the  al¬ 
gorithm,  and  hashing  techniques  are  heavily  used  throughout.  Although 
fairly  simple  methods  have  proved  effective  for  moderate  sized  problems,  it 
is  likely  that  specialized  hashing  techniques  will  be  important  for  handling 
very  large  analysis  problems. 

However  the  use  of  the  coding  represents  a  tradeoff.  Whilst  operations 
such  as  comparison  of  constructed  expressions  are  dramatically  improved, 
the  operation  of  projection  becomes  slightly  more  expensive  and  also  there 
are  overheads  in  initializing  and  maintaining  the  data  structures  used.  Thus 
far,  these  costs  appear  to  be  insignificant  compared  to  the  substantial  im¬ 
provements  over  a  very  early  implementation  using  an  explicit  PROLOG-style 
term  representation. 


Dependency  Directed  Updating 

We  now  address  the  issue  of  focusing  on  productive  steps.  Note  that  once  a 
particular  instance  of  a  step  has  been  applied,  it  never  needs  to  be  applied 
again.  For  example,  if  op{X)  is  the  projector  /p)(^)  and  the  constructed 
expression  /(6,c)  appears  in  c<m{y),  then  once  b  has  been  added  to  con(A'), 
we  never  again  need  to  apply  cp(X)  to  f{b,c).  We  exploit  this  by  using 
a  dependency  directed  updating  scheme.  In  other  words  we  only  apply 
operations  to  new  constructed  expressions.  SpedficaUy,  we  define  an  array 
dep  such  that  dep{X)  is  the  set  consisting  of 
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X  =  bUf(X)UZ 
y  =  cu(xn2) 
2  = 


con 

var 

op 

dep 

X 

hJ{X) 

2 

r\jdep{y),projjdep{2) 

y 

c 

Xn2 

2 

d 

fw(^) 

varjiep{X),  r\jdep{y) 

Figure  8.5:  Example  Constraints  and  Their  Representation 
{varjiepiy)  :  X  €  tjor(y)} 

u  {projjiepiy)'f^^{X)eop{y)} 

U  {C\jdep(y) :  (fli  n  •  •  •  n  o„)  €  op(y)  where  some  Cj  is  /I:'  } 

We  refer  to  items  of  the  form  varjdep{y),  projJiep{y)  or  n_dep(y)  as  de¬ 
pendencies.  The  set  of  dependencies  for  a  variable  X  indicates  where  any 
new  constructed  expressions  for  X  have  to  be  propagated.  Figure  8.5  con¬ 
tains  an  example  collection  of  constraints  and  its  representation  in  terms 
of  arrays  con,  var  op  and  dep.  Using  this  representation,  the  algorithm 
can  now  be  described  in  Figure  8.6.  The  algorithm  is  initiated  by  inspect¬ 
ing  each  variable  X  and  calling  add{jproj},i{ce),X)  for  each  ce  €  rhs{a)  if 
op{X)  is  /(7/(a),  and  calling  add{inier8ect^{ce\, . . . , ccn))  for  each  sequence 
(cci, . . . ,  c€„)  such  that  ccj  €  rfts(o,),  if  op(X)  is  Oi  n  •  •  •  n  c,i. 


Intersection  Variables  and  Non-Empty 

We  conclude  by  discussing  intersection  variables  and  the  function  nonempty. 
The  generation  and  management  of  intersection  variables  is  straightforward. 
Corresponding  to  the  function  V,  an  array  is  used  to  record  the  set  of  atomic 
set  expressions  corresponding  to  each  intersection  variable.  The  mapping 
from  a  set  of  atomic  set  expressions  {oi, . . . ,  a„}  into  an  intersection  variable 
is  performed  by  first  constructing  Af{a\ )  U  *  ■  ■  U  V(an).  The  elements  of  this 
set  are  then  sorted  and  the  resulting  sequence  is  looked  up  in  a  hash  table, 
where  each  entry  of  this  table  consists  of  a  sorted  list  of  atomic  expressions 
and  the  corresponding  intersection  variable.  If  the  sequence  is  found  in  the 
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update{ce,dep,X)  { 

if  dep  is  varjdep(y)  then  eidd{cefy); 

if  dep  is  projjdep(y)  and  op(y)  is  f^^(X)  then  add{projf^i{ce)^y); 
if  dep  is  njdep(y)  and  op(y)  is  ai  n  •  •  •  0  then 

let  X  be  aj;  j*  note  that  X  must  appear  in  ai  . . . ,  */ 

foreach  sequence  (cci,...,cc,_i,cc,cei+i,... ,ce„) 
such  that  ccj  €  rhs(aj),  j  ^  i 
add{  intersectnicex , . . . ,  ce,_i ,  ce,  ce,+i , . . . ,  ce*),  y); 
construct  appropriate  equations  for  any  new  intersection  variables; 

} 

add{ce,X)  { 

if  ce  ^  con(X)  then 

con(X)  :=  con(X)  U  {ce}; 
foreach  dep'  €  dep{X) 
update(ce,  depf,  X); 

} 

Figure  8.6:  Dependency>Based  Algorithm 


8.3.  OUTLINE  OF  IMPLEMENTATION 


269 


table,  the  appropriate  intersection  variable  is  returned.  If  the  sequence  is 
not  found,  then  a  new  variable  Vs  is  generated  and  the  table  is  updated 
appropriately.  Also  a  new  equation  is  generated  by  setting  var{Vs)  {}< 
con(V7v)  :=  0  and  opiVs)  {oin-'-non}  where  N  is  A/'(oi)U”*UA/’(a„), 
and  updating  the  dependencies  for  each  variable  X  in  {ai,...,an}  using 
dep{X)  :=  dep{X)  U  {opjdep(VN)},  and  then  finally,  if  each  a,-  is  a  non¬ 
variable  atomic  expression,  calling  add(inter3ectn{ai,  •  •  •  ,an)>  Vjv) 

We  now  address  the  issue  of  the  non-empty  condition  in  the  projec¬ 
tor  step.  The  algorithm  used  is  essentially  a  variation  of  the  algorithm 
to  determine  non-emptiness  described  in  Section  7.1.  However  the  previ¬ 
ous  algorithm  is  modified  so  that  it  incrementally  computes  the  mapping 
nonempty,  instead  of  recomputing  it  each  time  it  is  needed.  As  before,  let 
nonempty  be  an  array  that  maps  each  atomic  expression  a  to  a  boolean 
value  that  is  true  if  a  is  currently  known  to  be  non-empty.  At  the  start 
of  the  algorithm,  each  entry  in  nonempty  is  set  to  true  iff  a  is  a  ground 
atomic  expression  that  is  non-empty  under  all  interpretations.  The  value  of 
nonempty(f(ai, . . .  ,an))  is  updated  to  true  if  nonempty^a,)  is  true  for  all 
1  <  i  <  n.  The  value  of  nonempty{X)  for  a  variable  X  is  updated  to  true  if 
con(X)  contains  a  non-empty  atomic  expression.  This  updating  is  managed 
using  lists  of  dependencies.  For  each  atomic  expression  o,  we  maintain  the 
list  of  the  atomic  expressions  directly  dependent  on  nonempty(a).  When 
nonempty{a)  changes  from  false  to  true,  the  atomic  expressions  dependent 
on  a  are  recomputed.  In  essence,  the  computing  of  nonempty  is  merged  into 
the  rest  of  the  algorithm.  Hence,  whenever  we  need  to  determine  whether 
c  is  currently  non-empty,  we  just  inspect  nonempty(a);  there  is  no  explicit 
call  to  the  nonempty  function. 

We  now  address  the  issue  of  how  to  incorporate  the  non-empty  condition 
in  the  projector  updating  step.  Recall  that  the  step  is:  if  op{X)  is 
ce  €  con(y)  and  nonempty{ce),  then  add  proff^^ce)  to  con{X)  or  t;ar(A'). 
Now,  there  is  no  difficulty  if  we  find  that  nonempty{ce)  is  true,  since  we 
can  then  proceed  with  the  updating  step.  However,  if  nonempty{ce)  is  false 
then  we  need  to  ensure  that  if  nonempty{ce)  ever  becomes  true,  then  the 
projector  updating  step  is  eventually  completed.  This  is  achieved  by  adding 
a  new  kind  of  dependency  to  the  non-empty  dependency  lists. 
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8.4  Intersectors 


The  previous  section  outlined  a  rudimentary  implementation,  which  al¬ 
though  a  significant  improvement  over  a  naive  implementation,  is  still  im¬ 
practical.  The  main  reason  for  this  is  intersectors,  which  are  perhaps  the 
most  problematic  part  of  the  implementation.  Not  only  are  they  responsible 
for  introducing  new  variables,  but  updating  intersectors  has  a  combinatorial 
nature.  For  example,  consider  an  intersector  oi  n  •  •  •  n  On,  and  suppose  that 
ce  has  been  freshly  added  to  c<m(a,).  Then,  in  updating  this  intersector  us¬ 
ing  this  constructed  expression,  we  must  consider  all  possible  combinations 
of  ccj  €  con(aj),  j  ^  t. 

A  first  step  in  dealing  with  this  problem  is  to  partition  the  constructed 
expressions  in  con(X)  according  to  thdr  principal  function  symbol.  For 
example,  consider  updating  the  intersector  X  ny  D  Z  using  a  newly  con¬ 
structed  expression  f(y)  for  Z.  If  con(A')  =  {/(^),/(W),5(A')}  and 
con{y)  —  {f(Z),g(y)},  then  clearly  we  only  need  to  consider  the  inter¬ 
sections  f(X)  n  f{Z)  n  f(y)  and  /(W)  n  f{Z)  n  /(y),  and  we  can  ignore 
intersections  such  as  f{X)  fl  </(y)  n  f{y),  since  they  will  always  be  empty. 
Although  this  very  simple  modification  Is  useful,  more  fundamental  changes 
are  required  to  implement  intersection  eflidently.  In  essence  we  need  to  ex¬ 
ploit  the  substantial  redundancy  that  is  typically  present  in  set  constraints. 


Minimal  DNF  Expansion 

Consider  the  intersector  X  oyr^Z  where  con(A'),  con{y)  and  con{Z)  are 
respectively  {A,  B,C},{A,C,  Z)}  and  {A,  5, 1?},  where  A,B,C  and  D  are 
distinct  constructed  expressions.  In  essence,  we  wish  to  compute 

(A  U  B  U  C)  n  (A  U  C  U  D)  n  (A  U  B  U  Z?). 

A  naive  expansion  of  this  expression  would  lead  to  27  intersections:  An  An  A, 
A  n  A  n  C,  etc.  However  for  this  example,  we  only  need  to  compute  the  ex¬ 
pressions  A,  Br\D,  CCiB  and  CnZl,  and  so  we  can  reduce  27  intersections  to 
just  three.  The  problem  of  minimizing  the  number  of  expressions  that  must 
be  computed  can  be  viewed  as  a  special  case  of  computing  a  minimal  DNF 
form,  given  an  expression  in  CNF.  A  number  of  different  approaches  were 
tested.  The  essence  of  the  current  approach  is  to  precompute  information 
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about  the  pattern  of  occurrences  of  constructed  terms  that  appear  in  more 
than  one  “disjunction”,  and  then  to  use  this  information  to  sequentially 
build  up  minimal  conjunctions. 

We  now  outline  how  this  approach  to  intersection  is  incorporated  in  the 
implementation.  It  is  dear  that  updating  intersectors  is  very  expensive,  and 
should  therefore  be  done  as  infrequently  as  possible.  This  can  be  achieved  by 
giving  precedence  to  updates  involving  projectors.  Spediically,  the  updating 
is  split  into  two  phases.  The  first  consists  of  the  exhaustive  updating  using 
projectors.  The  second  consists  of  updating  using  intersectors.  The  algo- 
rithm  then  proceeds  by  repeatedly  alternating  these  two  phases.  As  a  side 
effect,  this  organization  of  updating  steps  also  enhances  the  performance 
of  the  DNF  minimization  algorithm,  since  by  ddaying  its  application,  the 
amount  of  redundancy  increases.  The  algorithm  can  now  be  described  as 

repeat 

exhaustively  apply  all  possible  projector  steps; 
recompute  each  intersector; 
until  no  change; 

Approximating  Subsumption  Relationships 

We  condude  this  section  by  describing  an  enhancement  the  performance  of 
intersector  recomputation.  This  modification  is  perhaps  the  most  important 
described  so  far.  Consider  an  intersector  If  con(A’)  =  {A,B}  and 

con{y)  =  {C,D}  and  in  the  least  model  of  the  equations,  A  C  B  and 
CCD,  then  dearly  the  recomputation  of  the  intersector  can  be  reduced  to 
computing  BDD.  However,  establishing  whether  subsumption  relationships 
such  as  AC  B  hold  in  the  least  modd  can  only  be  done,  in  general,  once  the 
least  model  is  known.  The  approach  used  here  involves  approximating  the 
least  model  using  the  information  contained  in  the  array  con.  Spedfically, 
this  array  defines  an  equation  for  each  variable  X: 

A*  =  Cl  U  ' •  •  U On,  where  con(A')  =  {oi, . . .,0^}. 

Hence  con  can  be  considered  to  define  a  collection  of  equations  in  explicit 
form,  and  this  in  turn  defines  an  interpretation,  call  it  Icon-  This  interpre¬ 
tation  represents  the  part  of  the  least  model  that  has  been  computed  by  the 
algorithm  so  far.  If  Iim  denotes  the  least  model  of  the  constraints  input  to 
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the  algorithm,  then  Teon{X)  ^  for  all  variables  X  appearing  in  the 

input  constraints. 

Importantly,  Jeon  provides  a  useful  approximate  of  the  subsumption  re¬ 
lationships  that  hold  in  Iim,  and  can  be  used  to  simplify  the  intersector 
computation  as  follows.  First,  we  use  Jim  to  compute  the  non-redundant 
components  of  the  sets  con{X).  Let  5  be  a  set  of  constructed  expressions, 
let  J  be  an  interpretation  and  dehne  that  ce  is  a  maximal  element  of  5  with 
respect  to  I  if  cc  €  5  and  for  each  se‘  6  S,  J(se')  J(sc).  In  other  words, 
there  is  no  expression  in  S  that  is  strictly  larger  than  se.  Then,  given  the 
sets  con(A'),  define  for  each  X  the  set  of  maximal  constructed  expressions 
as  foDows: 

maxjcon{X)  ^  {ce  6  con(X)  :  ce  is  maximal  wrt  Jeon}  (8.39) 

If  con{X)  contsdns  several  maximal  elements  that  are  equal  under  Jeont  then 
we  put  one  of  them  in  max.con(X)  and  discard  the  others.  Finally,  the 
updating  of  each  intersector  is  carried  out  using  the  constructed  expressions 
in  max  jeon  instead  of  con. 

In  general,  there  is  no  formal  connection  between  the  subsumption  rela¬ 
tionships  that  hold  in  Jjm  and  Jconi  and  the  sets  max.con(X)  are  essentially 
arbitrary  subsets  of  con{X).  However,  in  practice  the  correlation  between 
the  subsumption  relationships  that  hold  in  Jim  and  "Jeon  is  dose,  espedally 
after  the  early  stages  of  the  algorithm.  In  essence  this  is  because  Jeon  in¬ 
creases  quickly  in  the  early  stages  of  the  algorithm,  and  for  most  of  the 
algorithm’s  execution  it  closely  approximates  Jim-  We  shall  provide  some 
empirical  evidence  to  support  this  daim  in  the  next  section. 

We  now  briefly  outline  the  correctness  of  this  modified  intersector  recom¬ 
putation.  Since  the  modified  intersector  recomputation  serves  to  reduce  the 
number  of  constructed  expressions  that  would  otherwise  be  generated,  the 
main  change  in  the  proof  from  the  basic  intersection-projection  algorithm 
relates  to  completeness.  In  essence,  the  key  to  the  proof  involves  showing 
that  the  modified  algorithm  produces  suffident  constructed  expressions  so 
that  Jeon  is  a  modd  of  the  constraints.  Modifications  to  the  previous  com¬ 
pleteness  proof  are  needed  when  the  definitions  of  transformations  are  used 
to  argue  that,  on  termination,  certain  constraints  are  present.  The  main  ob¬ 
servation  required  is  that  when  maxjcon  is  computed  from  con  using  (8.39), 
the  explicit  form  constrsunts  described  by  maxjcon,  call  them  Cmaxjom  are 
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such  th&t  ltTl{Cmax.con)  —  ^con- 

Additional  Improvements 

The  preceding  sections  contain  only  the  major  design  decisions  of  our  im¬ 
plementation.  However,  there  are  number  of  minor  modifications  which,  in 
sum,  have  an  important  impact  on  the  systems  performance.  We  now  very 
briefly  summarize  a  number  of  these.  During  the  execution  of  the  algorithm, 
many  variables  have  the  same  con  sets.  Typically  the  number  of  distinct  sets 
con(A'),  where  X  ranges  over  all  set  variables,  is  approximately  a  third  of 
the  total  number  of  set  variables.  Hence  a  number  of  operations  only  need  to 
be  computed  once  for  each  distinct  set  con(X),  and  this  can  lead  to  impor¬ 
tant  savings,  particularly  in  the  subsumption  component  of  the  algorithm. 
The  computation  of  intersectors  is  another  place  where  mamy  improvements 
can  be  made.  First,  it  is  worthwhile  to  store  the  sequences  of  constructed 
expressions  that  have  been  considered  at  each  intersector.  Then,  when  re¬ 
computing  an  intersector,  any  sequence  that  has  been  generated  previously 
can  be  ignored.  Second,  an  intersector  only  needs  to  be  recomputed  if  the 
variables  on  which  it  depends  have  changed  since  the  last  time  it  was  re¬ 
computed.  By  keeping  track  of  the  changes  to  these  variables,  the  number 
of  intersector  recomputations  can  be  reduced  by  up  to  50%. 

8.5  Empirics 

The  implementation  described  in  this  paper  has  been  developed  over  a  pe¬ 
riod  of  three  years.  A  very  early  version  used  an  explicit  representation 
for  constructed  expressions.  However  the  comparison  of  constructed  expres¬ 
sions  was  prohibitively  expensive  and  only  very  small  programs  could  be 
analyzed.  The  next  version  employed  integer  term  representation  as  well 
as  an  early  form  of  the  modifications  for  intersector  simplification  described 
in  the  previous  section.  Most  of  the  basic  notions  described  in  this  paper 
were  contained  in  this  version.  The  current  version  re-implements  these 
notions  more  efficiently  and  makes  greater  use  of  dependency  directed  up¬ 
dating.  The  subsumption  algorithm  was  completely  redesigned  using  bit 
vectors.  The  system  consists  of  approximately  4,500  lines  of  Standard  ML 
(this  includes  programs  to  construct  the  set  constraints  corresponding  to  a 
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VI  (Nov  89)  1 

1  V2  (Nov  90)  1 

1  V3  (Nov  91) 

time 

eqns 

time 

eqns 

time 

eqns 

nrev-bu 

0 

18 

0.03 

14 

0.01 

10 

nrev-td 

3.7 

50 

0.48 

48 

0.21 

35 

imp-bu 

280 

400 

4.7 

233 

0.9 

128 

imp-td 

?? 

?? 

21 

459 

5.3 

372 

dnf-bu 

>2600 

>511 

2.1 

142 

0.65 

96 

dnf-td 

?? 

?? 

210 

1002 

12.59 

440 

Table  8.1:  Impact  of  Design  Decisions 


program  as  well  as  the  set  constraint  solver  and  its  front  end). 

Table  8.5  compares  these  three  versions  of  the  algorithm.  The  bench¬ 
marks  used  in  this  table  are  based  on  three  small  logic  programs.  The  first 
is  the  standard  naive  reverse  program  for  reversing  Hsts  program  (which 
consists  of  four  rules,  two  of  them  facts).  The  second  is  an  interpreter  for  a 
simple  imperative  programming  language  similar  to  the  language  outlined 
in  Chapter  3.  This  program  consists  of  55  rules,  25  of  them  facts.  The 
third  program  is  a  program  to  convert  a  propositional  logic  formula  into  a 
formula  in  disjunctive  normal  form.  It  contains  32  rules  (10  facts  and  22 
non-facts  with  an  average  of  about  2  body  atoms  per  non-fact  rule)  and 
contains  a  large  number  of  mutually  recursive  calls.  For  all  of  these  pro¬ 
gram,  we  consider  the  set  constraints  corresponding  to  both  bottom-up  and 
top-down  left-to-right  execution.  For  each  version  of  the  algorithm  and  each 
benchmark,  we  give  the  time  taken  to  run  the  benchmark  and  the  number 
of  equations  generated.  All  timings  are  on  a  Sun  Sparc  1-|-  (24MB)  running 
Mach  and  using  version  0.75  of  Standard  ML  of  New  Jersey.  Note  that 
for  the  larger  examples,  the  difference  between  the  first  and  third  imple¬ 
mentations  is  in  excess  of  one  thousand.  In  fact  timings  for  a  number  of 
benchmarks  could  not  be  obtained  using  the  first  implementation. 

We  now  give  more  timings  for  the  third  implementation,  with  particular 
emphasis  on  the  design  decisions  outlined  in  the  previous  sections.  Table  8.5 
presents  a  breakdown  of  the  time  and  space  behavior  of  the  algorithm.  Two 
additional  benchmarks  are  used  in  this  table.  The  constraints  for  these 
benchmarks  are  given  in  Figure  8.7,  where  denotes  11  applications 

of  f  to  X  and  denotes  33  applications  of  the  projector  fz}  to  X. 

These  are  essentially  pathological  cases  designed  for  testing  the  rcmustness 
of  the  intersector  and  projector  parts  of  the  algorithm  respectively.  Neither 
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time 

1  space 

total 

(secs) 

1^ 

sub 

(%) 

n 

(%) 

eqns 

n 

con 

exps 

model 

size 

nrev-bu 

0.01 

0 

100 

■Ql 

lO-bO 

0-1-0 

11 

nrev-td 

10 

76 

14 

20-1-15 

8-f5 

20 

26 

imp-bu 

0.9 

8 

85 

7 

120-1-138 

13-1-12 

119 

imp-td 

8 

80 

12 

246-1-126 

246-1-126 

231 

dnf-bu 

0.65 

29 

68 

3 

88-1-8 

3+4 

82 

dnf-td 

12.59 

6 

59 

35 

156-1-284 

53+254 

313 

1cm 

13.86 

0 

99 

1 

3-1-142 

3+142 

171 

hcf 

27.54 

1 

99 

0 

1-1-33 

0+0 

148 

4 

Table  8.2:  Time  and  Space  Measurements  for  V3 


1cm 

hcf 

X  =  aUf\X) 
y  =  auf^\y) 

2  =  xny 

X  =  aU  f-^{X)  U  f**{X) 

Figure  8.7:  The  1cm  and  hcf  Benchmarks 


benefits  form  the  subsumption  component  of  the  algorithm. 

For  each  benchmark,  Table  8.5  gives  timing  information,  equation  counts 
and  an  indication  of  the  space  used.  The  first  column  of  the  table  is  total 
time,  and  the  next  three  columns  break  this  down  into  time  spent  on  pro¬ 
jectors,  inferring  subsumption  relationships  and  intersectors,  each  expressed 
as  a  percentage  of  total  time.  Column  5  gives  the  total  number  of  equations 
in  the  form  x  +  y  where  x  is  the  number  of  equations  in  the  benchmark, 
and  y  is  the  number  of  additional  equations  generated  during  execution. 
Column  6  give  the  number  of  equations  whose  right  hand  side  contains  an 
intersector.  Again  entries  are  of  the  form  x  +  y  where  x  is  the  number 
of  intersectors  in  the  input  equations,  and  y  is  the  number  of  intersectors 
introduced  during  execution.  Columns  7  is  the  total  number  of  distinct  con¬ 
structed  expressions.  Column  8  provides  a  measure  of  the  complexity  of 
the  least  model  constructed  by  the  algorithm,  and  is  obtained  as  follows. 
First,  list  all  of  the  sets  maxjcon^Af),  for  each  set  variable  X.  Some  of  these 
sets  will  be  the  same  (in  which  case  the  corresponding  variables  are  equal 
in  the  least  model).  The  compleidty  measure  is  then  obtained  by  summing 
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subsumption 

1  no  subsumption 

time 

total 

con 

table 

time 

total 

con 

table 

(secs) 

eqns 

exps 

size 

(secs) 

eqns 

exps 

size 

nrev-bu 

0.01 

10 

11 

16 

0 

10 

11 

16 

nrev-td 

0.21 

35 

20 

75 

0.11 

35 

20 

75 

dnf-bu 

0.65 

96 

82 

1689 

1.01 

184 

126 

5309 

dnf-td 

12.59 

440 

313 

9317 

»218 

2669 

1510 

280K 

13.86 

145 

171 

292 

0.26 

145 

171 

292 

hcf 

27.54 

34 

148 

1644 

0.31 

34 

148 

1644 

Table  8.3:  Effects  of  the  Subsumption  Modification 


the  cardinalities  of  the  distinct  sets  that  appear  in  this  list.  These  results 
show  that  bottom-up  constraints  are  substantially  easier  to  solve  than  top- 
down  left-to-right  constraints.  This  general  relationship  holds  because,  by 
their  nature,  top-down  constraints  are  more  complicated  and  more  accurate 
than  the  bottom-up  constraints.  The  subsumption  approximation  part  of 
the  algorithm  accounts  for  the  majority  of  execution  time.  The  1cm  and  hcf 
examples  show  extreme  behavior.  In  the  case  of  1cm,  the  output  for  X  is 
X  =  aU  and  for  hcf  the  output  is  =  a  U  /^(X). 

The  driving  example  used  during  the  development  of  this  implementation 
was  the  top-down  left-to-right  constraints  for  the  dnf  program.  A  number 
of  features  of  this  program  lead  to  top-down  constraints  that  are  unusually 
expensive  to  solve.  In  particular,  it  was  this  program  that  motivated  the 
subsumption  approximation  part  of  the  implementation.  Table  8.5  illus¬ 
trates  the  effects  of  the  subsumption  component  of  the  algorithm  on  all  of 
the  example  constraints.  For  the  analysis  of  nrev,  and  for  the  1cm  and  hcf 
constraints,  there  is  little  redundancy  and  the  subsumption  component  is 
expensive  and  provides  no  direct  benefit.  However  for  the  more  substantial 
dnf  program,  it  is  clearly  of  benefit,  and  in  the  case  of  top-down  analysis,  it 
is  crucial.  Specifically,  in  this  case  the  difference  in  time  is  nearly  a  factor 
of  20,  and  for  one  measure  of  memory  usage  (the  number  of  entries  in  the 
central  hash  table  of  the  implementation),  the  difference  is  a  factor  of  30. 
(In  fact  the  measurements  for  this  entry  in  the  table  had  to  be  made  on  a 
64MB  DECstation  5000,  and  then  converted  to  equivalent  Sparc  l-(-  times). 

Table  8.5  illustrates  the  dynamic  behavior  of  the  algorithm  by  giving 
cumulative  measures  of  the  consumption  of  resources  during  execution  of 
the  top-down  dnf  constraints.  One  ‘‘iteration”  corresponds  to  an  exhaustive 
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terations 

1 

2 

3 

4 

5 

6 

7 

8 

9 

%  time 

2 

8 

18 

31 

47 

61 

81 

91 

100 

%  n  eqns 

19 

24 

40 

67 

85 

96 

100 

100 

100 

%  model 

23 

43 

58 

68 

78 

86 

93 

100 

100 

Table  8.4:  Cumulative  Consumption  of  Resources  During  Execution. 

application  of  the  projector  transformation  followed  by  a  recomputation  of 
each  intersector.  The  measure  of  the  amount  of  the  least  model  computed 
at  each  stage  is  obtained  as  follows.  Let  max. con(X)\ final  denote  the  value 
of  the  set  maxjt:on(X)  on  termination  of  the  algorithm.  Let  6(X)  denote 
the  proportion  of  the  elements  in  maxjcon(X)\finai  that  currently  appear 
in  con(X),  and  this  represents  a  pessimistic  estimate  of  the  proportion  of  X 
that  has  been  currently  computed.  Finally,  the  measure  of  the  amount  of 
the  least  model  computed  is  just  the  average  of  S(X)  over  all  variables  X 
currently  appearing  in  the  equations.  The  behavior  of  this  measure  indicates 
that  the  model  grows  quickly  in  the  early  stages,  and  that  most  of  the  time 
is  spent  obtaining  the  last  few  components  of  the  model.  This  supports 
the  notion  that  during  most  of  the  execution  of  the  algorithm,  maxji:on(X) 
provides  a  fairly  good  approximation  to  the  least  model  for  the  purposes  of 
obtaining  subsumption  relationships. 

To  illustrate  the  effects  of  the  minimal  dnf  expansions  component  of  in¬ 
tersector  simplification,  we  again  use  the  top-down  dnf  constraints.  When 
applied  to  these  constraints,  the  implementation  constructs  1270  intersec¬ 
tions.  In  comparison,  a  simple  expansion  of  these  formulas  would  have  led 
to  1921  Intersections.  Although  this  only  represents  a  direct  saving  of  649 
intersections,  or  about  a  third,  the  indirect  savings,  in  terms  of  intersection 
variables  that  do  not  need  to  be  introduced,  is  more  significant. 

In  summary  then,  we  have  worked  with  a  moderate  sized  program  (dnf) 
that  exhibits  properties  that  are  problematic  for  analysis,  such  as  substantial 
mutual  recursion  and  intersection.  Very  significant  progress  has  been  made 
and  this  example  can  now  be  analyzed  within  a  reasonable  time  and  space 
bound.  While  this  provides  strong  evidence  that  set  based  analysis  can  be 
made  practical,  much  work  remains.  Our  experience  suggests  that  a  number 
of  components  of  the  algorithm  can  be  further  improved.  Currently  the 
major  expense  is  the  subsumption  algorithm,  and  one  reason  for  this  is  that 
it  is  not  by  nature  incremental.  We  are  investigating  ways  of  overcoming 
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this,  including  trading  some  of  its  effectiveness  for  the  ability  to  reuse  results 
from  previous  iterations.  Another  avenue  for  improvement  lies  in  the  use  of 
specialized  hashing  techniques. 

Thus  far  we  have  focussed  extensively  on  efficiency  aspects  of  the  anal¬ 
ysis.  However  we  have  now  reached  a  stage  where  moderate  sized  programs 
can  be  analyzed.  We  therefore  plan  to  use  this  implementation  to  investigate 
set  based  analysis,  with  particular  emphasis  on  the  quality  of  the  informa¬ 
tion  obtained^  and  its  relevance  to  compilation.  Practical  program  analysis 
must  also  deal  with  operations  such  as  call,  assert  and  retract.  PreUminary 
work  suggests  that  it  may  be  possible  to  directly  modeled  these  operations 
in  a  set  constraint  framework. 

We  conclude  with  a  discussion  of  related  work.  Most  of  the  work  on 
implementation  of  analysis  for  logic  programs  is  based  on  abstract  interpre¬ 
tation  and  the  algorithms  used  are  fundamentally  different  from  those  for 
solving  set  constraints.  Our  work  is  more  closely  connected  to  work  on  types 
for  logic  programs  in  which  types  are  defined  by  ignoring  inter-variable  or 
inter-argument  dependencies.  Implementations  have  been  reported  in  [55] 
and  [68],  but  the  former  does  not  focus  on  practical  issues,  and  the  latter  is 
not  directly  comparable  to  our  work  since  it  deals  with  type  checking  rather 
than  type  inference.  The  most  closely  related  work  to  ours  is  [4],  which 
describes  an  implementation  of  type  inference  for  the  functional  language 
FL.  In  very  general  terms  their  observations  about  the  complexity  of  the 
intersection  operation  are  similar  to  ours.  However,  the  two  algorithms  are 
completely  different  in  nature.  In  particular  their  algorithm  does  not  include 
the  projection  operation,  and  this  appears  to  substantially  alter  the  tradeoffs 
and  design  decisions  that  are  made.  Furthermore,  our  implementation  has 
been  specifically  designed  to  deal  with  constraints  where  there  is  substantial 
mutual  dependency  between  variables.  Such  constraints  are  typical  in  the 
constraints  generated  for  the  top-down  analysis  of  logic  programs. 


*As  an  example  of  the  output  of  the  algorithm,  the  top-down  aaalyais  of  the  dnf  program 
constructs  the  following  set  .T  to  describe  the  set  of  terms  that  are  the  result  of  putting  an 
arbitrary  input  formula  into  dnf:  X  =  or(X,  X)  U  and{X,  X)  U  not{C)  where  C  describes  the  set 
of  propositional  constants. 


Part  III 


Extensions  and  Future  Work 


The  developments  of  Parts  I  and  II  have  focussed  on  approximat¬ 
ing  the  run-time  values  of  program  variables  in  imperative  and  logic 
programs.  The  main  reason  for  this  focus  was  to  provide  a  concrete 
setting  for  the  formalization  of  set  based  analysis.  However,  the  un¬ 
derlying  ideas  of  set  based  analysis  are  by  no  means  restricted  to  this 
kind  of  analysis.  We  now  show  how  they  can  be  extended  to  capture 
information  other  than  variable  values,  reason  about  inter-variable 
dependencies,  and  analyze  functional  languages.  The  style  of  pre¬ 
sentation  for  this  part  shall  emphasize  exaunples  and  intuition  rather 
than  formal  details. 

We  begin  by  outlining  how  set  constraints  may  be  used  for  com¬ 
puting  program  properties  such  as  modes  (in  logic  programming) 
and  structure  sharing.  We  then  address  the  issue  of  inter-variable 
dependencies.  It  is  dear  that  many  kinds  of  analysis  benefit  from 
information  about  inter-variable  dependendes,  and  in  this  respect 
set  based  analysis  is  defident.  We  therefore  develop  an  approach 
to  program  analysis  that  combines  the  accuracy  of  reasoning  about 
program  structures  inherent  in  set  based  analysis,  with  the  ability  of 
abstract  interpretation  to  reason  about  inter-variable  dependendes. 
This  is  carried  out  in  the  context  of  logic  programs,  and  leads  to  a  hy¬ 
brid  logic  program  analysis  engine  that  is  more  accurate  than  either 
abstract  interpretation  or  set  constraints.  We  condude  by  outlining 
how  set  based  analysis  may  be  applied  to  functional  languages.  We 
focus  on  Standard  ML,  and  illustrate  connections  with  type  systems, 
particularly  those  based  on  simple  subtypes. 
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Chapter  9 

Modes  and  Structure 
Sharing 


The  core  part  of  this  thesis  focuses  on  the  possible  run-time  values  of  pro¬ 
gram  variable.  However,  the  basic  process  of  constructing  set  constraints 
from  a  program  and  then  solving  these  set  constraints  preserves  numerous 
structural  properties  of  a  program.  It  is  therefore  possible,  with  only  minor 
modifications  to  the  set  constraint  algorithm,  to  compute  approximations  to 
a  variety  of  other  program  properties.  We  illustrate  this  with  two  kinds  of 
analysis.  First,  we  outline  the  computation  of  instantiation  levels  of  variables 
during  execution  of  a  logic  propam  (mode  analysis).  Second,  we  describe 
the  use  of  set  based  analysis  to  obt^  information  about  the  sharing  of 
structures  between  propam  variables.  This  chapter  provides  only  an  infor¬ 
mal  description  of  what  is  involved  for  these  two  extensions.  Its  purpose  is 
to  demonstrate  that  the  techmques  of  set  based  analysis  are  not  tied  to  a 
specific  kind  of  analysis,  just  as  they  are  not  tied  to  a  specific  operational 
model. 
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T 

= 

P("i)(-R«<p)  ^  9{i)iRetq) 

3.  *-piX)\  q{Xf. 

y* 

/[2)(F(i)(C'oWp)) 

4  p(/(6,y)). 

2^ 

5.  q{f(Z,c)). 

Callp 

= 

Call, 

s= 

PiX^) 

Retp 

= 

p(fib,y)) 

Callp 

= 

^(/(2^c)) 

Figure  9.1:  Mode  Analysis  Example 


9.1  Mode  Analysis 


Mode  analysis  involves  determining  information  about  the  instantiation  of 
variables  during  the  execution  of  a  program.  For  example,  at  some  point 
during  program  execution,  a  variable  X  could  be  uninstamtiated  (the  values 
of  X  are  not  constrained),  instantiated  to  some  fixed  value  (there  is  only 
one  possible  value  for  X),  or  else  partially  determined  (some  of  the  structure 
of  X  is  fixed,  but  the  value  ol  X  is  not  uniquely  determined).  In  the  first 
case  X  is  sadd  to  be  free,  in  the  second  case  X  is  said  to  be  ground,  amd  in 
the  third  case  X  is  said  to  be  partially  instantiated.  Consider  the  (labeled) 
prograun  in  Figure  9.1,  and  suppose  that  the  goal  *-  p{X),  9(X)  is  executed 
in  a  left-to-right  mamner.  When  p{X)  is  selected  from  the  goal,  X  is  free. 
When  9(A')  is  selected,  X  is  pairtially  instamtiated.  Finally,  when  the  goal 
hats  completed  execution,  X  is  ground.  Note  how  the  instamtiation  of  X 
changes  as  the  goal  executes. 

To  illustrate  the  use  of  set  constraunts  to  compute  mode  information, 
we  shall  focus  on  groundness  information.  That  is,  we  shall  seek  to  infer 
when  a  program  variable  is  ground.  Throughout  this  chapter,  we  shall 
use  set  constraints  of  the  form  Xi  =  sei,...,A'n  =  se„  where 
are  distinct  set  variables,  amd  eamh  se,-  is  constructed  from  union,  function 
symbols,  set  vau'iables  and  the  intersection  and  projection  operators.  Recall 
that  in  Chapter  8  we  outlined  how  such  constraunts  could  be  obtained  from  a 
program  (see  Section  8.1,  page  259).  Such  constraints  are  more  compact  but 
slightly  less  accurate  than  SCp.  Consider  the  prograun  and  its  constraints 
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shown  in  Figure  9.1.  The  output  of  the  set  constraint  algorithm  on  these 
constraints  is: 

D  T 

D  f(b,y*) 

3  me) 

y*  2  ^ 

2^  D  T 

Now,  if  we  interpret  T  as  the  set  of  all  terms  (not  just  ground  terms),  then 
the  least  model  of  the  above  constraints  is 

X^  {all  terms} 

X^  ►-»  {/(6,s) :  s  is  an  arbitrary  term) 

«  {/(6,c)} 

y*  {all  terms} 

2®  {aU  terms} 

and  this  correctly  describes  information  about  the  groundness  of  X  during 
program  execution.  This  process  can  be  formalized  by  re-interpreting  set 
constraints  so  that  they  map  into  arbitrary  sets  of  terms,  rather  than  sets 
of  ground  terms.  The  details  are  considerable  and  are  beyond  the  scope  of 
the  thesis.  We  instead  give  some  intuition  and  some  further  examples. 

The  underlying  reason  why  set  constraints  can  be  used  to  obtain  mode 
information  is  that  they  correspond  closely  to  the  operational  behavior  of  a 
program.  Broadly  speaking,  union  is  used  to  model  rule  choices,  projections 
are  used  to  model  the  matching  of  one  atom  against  another,  intersection  is 
used  to  model  unification,  and  top  corresponds  to  an  uninstantiated  variable. 
Moreover,  the  set  constraint  algorithm  preserves  this  correspondence.  In 
particular,  the  simplification  of  the  intersection  operator  (conservatively) 
models  the  behavior  of  intersection.  For  example,  the  set  expression  /(6)nT 
is  essentially  simplified  into  /(6),  corresponding  to  the  unification  of  f{b) 
with  an  uninstantiated  variable.  Note  that  during  the  algorithm  T  cannot 
be  (safely)  interpreted  as  an  uninstantiated  variable.  Instead  it  must  be 
interpreted  as  an  unknown  structure.  This  is  essentially  because  aliasing 
effects  are  ignored. 

Consider  the  append  program  in  Figure  9.2.  When  such  a  program  is 
analyzed,  we  typically  wish  to  infer  properties  such  as:  what  are  the  possible 
“modes”  of  calls  to  append,  given  that  it  is  called  with  first  and  second 
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2.  ^app(*,*,Ty. 

3.  appinil,W,W). 

5.  app(cons(X,  L),  Y,  cons(X,  Z))*—app(L,  Y,  Z)*. 

=  T 

=  app^^^(Reta„) 

=  app^^^(Callapp)napp^^^(Calla„) 

AT**  =  cons^^^{appJ^^(CaUapp))  n  consJ^^{app^^^{Calla„)) 

C*  =  cons^^^(app^^^(Callafp)) 
y*  =  0PPl2)iCalla„) 

2*  =  cons^^^(app^^^(Callapp)) 

A’5  =  cons^^^iapp^^^iCallapp))  n  cons^^^{app^^^{Callapp)) 

£5  =  cons^^^(app^^^{Callapp))napp^l^iReta„) 

2^  =  con5|-2yappg(Ca«„pp))  n  app^35(i2ci,pp) 

Retapp  =  app{nil,yV^jyV^)U  app{cons{X^,C^),y^,cons{X^,2^)) 
Callapp  =  opp(cons(6,m/),coTw(c,m7),V^)U  opp(£^,3^,Z^) 


Figure  9.2:  The  Append  Program  and  Itg  Top-Down  Constraints 
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arguments  ground.  To  express  this  compactly,  it  is  convenient  to  introduce 
a  new  constant  ★  to  denote  the  set  of  ground  terms.  That  is,  T  denotes  ail 
terms  and  *  denotes  just  ground  terms.  Corresponding  to  this  new  constant, 
we  have  some  new  simplification  rules; 

/(7/(*)  =  * 

★  n  T  =  * 

★  n  , . . .  ,  5n)  —  nSi,...,*n 

Note  that  ★  D  t  cannot  just  be  simplified  into  t  since  t  may  contain  non¬ 
ground  terms,  but  ★  n  t  is  always  ground.  For  example,  ★  n  /(T)  is  not  the 
same  as  /(T). 

The  implementation  described  in  Chapter  8  has  been  extended  to  pro¬ 
vide  mode  information  using  the  methods  just  outlined.  Although  it  adds  a 
number  of  special  cases,  particular  during  the  computation  of  intersectors, 
the  overhead  of  these  modifications  appears  to  be  negligible.  In  fact  a  num¬ 
ber  of  the  benchmarks  used  in  Chapter  8  employ  an  initial  goal  involving  ★ 
and  T.  For  example,  the  top-down  benchmarks  involving  the  naive  reverse 
program,  the  imperative  language  interpreter  and  the  dnf  program  used  the 
initial  goals  ♦-nrew(*,T),  ^ct>a/(*,T)  and  <-dn/(*,T)  respectively.  In  each 
case,  the  modes  computed  are  the  obvious  ones  (for  example,  all  calls  to  nrev 
have  first  argument  ground,  and  all  calls  to  append  have  first  and  second 
arguments  ground). 

To  give  some  intuition  about  the  accurau;y  of  the  mode  information  gen¬ 
erated,  consider  a  simple  abstract  interpretation  algorithm  that  uses  the  ap¬ 
proximate  values  any  and  ground  and  represents  substitutions  as  mappings 
from  program  variables  into  {any, ground}.  The  information  computed  by 
the  set  constraint  approach  will  be  strictly  more  accurate  than  this  simple 
abstract  interpretation  algorithm.  In  particular,  it  will  perform  better  on 
programs  involving  structural  information.  For  example,  the  set  constraint 
approach  determines  that  after  execution  of  the  goal  b))  in  the  first 

program  of  Figure  9.3,  X  will  be  ground,  whereas  the  abstract  interpretation 
algorithm  will  be  unable  to  determine  this  fact. 

However,  the  set  constraint  approach  ignores  inter-variable  dependen¬ 
cies,  and  these  are  clearly  important  for  mode  analysis.  For  example,  in  the 
second  program  of  Figure  9.3,  the  set  constraint  approach  is  not  able  to  de¬ 
termine  that  after  execution  of  *—eq{X,Y),r{Y),  the  variable  X  is  ground. 
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*-pifiX,b)).  ^eqiX,Y),r(Y). 
piU)^q(U).  eqiU,U). 

9{fia,V)).  r(a). 


^p(Y),r(Y,X). 

p(U)^q(U). 

9(f(a,V)). 

r{f{Wl,W2),Wl). 


Figure  9.3:  Three  Programs  niustrating  Accuracy  of  Mode  Analysis 


-p(A',y). 

p(Z,Z)  ^  q{Z). 

qim). 


^P{X,Y). 

pimjm 


Figure  9.4:  Sharing  of  Structures  Between  Variables 


In  contrast,  abstract  interpretation  approaches  have  been  developed  to  cap¬ 
ture  information  about  the  aliasing  and  sharing  behavior  of  variables;  see 
[15,  27,  44,  51,  59]  for  example.  These  dependency-hosed  approaches  infer 
that  both  X  and  Y  are  ground  for  this  program.  They  are,  however,  rather 
weak  in  reasoning  about  partially  instantiated  structures.  For  example,  in 
the  hrst  program,  the  dependency-based  approaches  cannot  infer  that  X 
is  ground  since  this  requires  reasoning  about  the  structure  of  the  terms  to 
which  U  can  be  bound. 

In  summary,  the  set  constraint  approach  is  more  accurate  on  some  pro¬ 
grams  because  of  its  superior  ability  to  reason  about  term  structure,  and 
the  abstract  interpretation  approach  is  more  accurate  on  some  programs  be¬ 
cause  of  its  ability  to  capture  information  about  inter- variable  dependencies. 
Note  that  neither  approach  is  able  to  determine  that  X  is  ground  after  the 
execution  of  the  goal  *-p(Y),r(Y,X)  in  the  third  program  of  Figure  9.3. 
This  motivates  the  development  of  hybrid  approaches  that  incorporate  set 
based  techniques  (for  reasoning  about  term  structure)  and  abstract  inter¬ 
pretation  techniques  (for  reasoning  about  dependencies).  We  shaU  consider 
such  an  approach  in  Chapter  10. 


9.2  Sharing  Analysis 


We  now  outline  how  set  constraints  can  be  used  to  obtain  information  about 
the  sharing  of  structures  between  program  variables.  Consider  the  two  pro- 
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grams  in  Figure  9.4.  While  the  declarative  semantics  of  both  programs  is 
the  same  (the  answer  substitution  for  *-p(X,Y)  is  [Xt-^f(b),Yt-^f{b)]  in 
both  cases),  there  are  important  differences  between  these  programs  at  the 
implementation  level.  In  particular,  after  execution  of  the  goal  in  the  first 
program,  it  is  usually  the  case  that  X  and  Y  are  bound  to  the  same  “copy” 
of  the  structure  f(b).  In  contrast,  after  execution  of  the  goal  in  the  second 
program,  X  and  Y  are  typically  bound  to  different  “copies”  of  the  structure 

m- 

Such  differences  become  important  when  one  considers  optimizations 
involving  reuse  of  structures.  To  iUustrate  such  optimizations,  consider  the 
following  rule 

p(X,Y,L)  ^  qiX,LX),  r(Y,LY),  app{LX ,  LY ,  L). 

where  app  denotes  the  standard  list  append  predicate  (which  appears  in 
Figure  9.2).  Suppose  that  it  has  already  been  determined  that  whenever  this 
rule  is  called,  X  and  Y  are  both  ground  and  do  not  contain  any  occurrences 
of  the  list  constructor  cons,  and  L  is  free.  Now,  the  rule  executes  by  calling 
q  and  r,  which  generate  lists  LX  and  LY  from  the  input  values  of  X  and 
y,  and  then  combining  the  results.  If  we  can  determine  after  execution  of 
q{X,  LX)  and  r(Y,LY)  that  LX  and  LY  are  both  ground  and  do  not  share 
any  cons  symbols,  then  the  operation  of  appending  LX  and  LY  to  give 
L  can  be  significantly  improved  by  modifying  the  execution  of  app  so  that 
the  structure  of  LX  is  reused.  In  effect,  this  modified  append  operation 
destructively  updates  the  end  of  the  list  LX  to  point  to  the  list  LY. 

We  now  show  how  set  constraints  can  be  adopted  to  compute  the  kind 
of  sharing  information  needed  for  structure  reuse  optimizations.  The  set 
constraints  corresponding  to  the  two  programs  in  Figure  9.4  are  given  in 
Figure  9.5.  After  substituting  the  definitions  of  Callp,  Retp  and  performing 
some  obvious  simplifications  of  the  projection  symbols,  these  constraints 
become: 

A*  =  T 

=  T  =  T 

=  z*  =  T 

y^  =  2*  =  f{b) 

z^  =  T  y^  =  /(6) 

=  T  n  T  n  /(6) 


288 


CHAPTER  9.  MODES  AND  STRUCTURE  SHARING 


2. 

4.  p{Z,Z)^qiZ)\ 

5.  9(/(6)). 


2.  ^p{X,Y)K 

3.  Pifib)J{b)). 


=  T 
=  T 

3^^  =  Pj2j(flc<p) 

23  =  p-5(C'a//p)np(-5(Ca//p) 

=  pf^CCaKp)  n  pl^^iCall^)  n 

Callp  =  p(X\y^) 

Call,  =  9(^3) 

iZetp  =  p{2*,Z*) 

Ret,  =  qif(b)) 


X^  =  T 
y^  =  T 

X^  =  Pjjj(^fp) 

—  P(2)(^^p) 

Ca//p  =  p{X\y^) 

Ret,  =  Pifib)Jib)) 


Figure  9.5:  Set  Constraints  for  Programs  in  Figure  9.4 


Consider  giving  a  unique  label  to  each  occurrence  of  a  function  symbol  in 
the  set  constraints  constructed  from  a  program.  Using  labels  ®,(D  etc.,  the 
above  constraints  become 

X^  =  T 
y^  =  T 
X^  = 

y^  =  z^ 

Z^  =Z  tot 

2^  =  T  n  T  n  /®(6®) 

Now,  the  definition  of  interpretation  of  set  constraints  can  be  extended  to 
deal  with  labeled  constraints  as  follows.  First  define  that  a  labeled  term 
is  a  term  in  which  some  of  the  function  symbols  are  labeled.  Now  define 
that  a  labeled  interpretation  is  a  mapping  from  set  variables  into  labeled 
terms.  Such  an  interpretation  is  extended  to  map  from  labeled  set  expressions 
into  sets  of  labeled  terms.  For  example  J(s€i  U  se2)  =  I(5ei)  U  I(sc2), 
Af(^{»e))  =  {si  :  /(si,...,s„)  €  I(sc)  or /''(si,... ,s„)  e  I(5c)}  where 
A  ranges  over  labels,  and  T{T)  is  the  set  of  all  unlabeled  terms.  Models 
and  least  models  are  defined  in  the  expected  way.  For  example,  in  the  least 


Xt  =  T 
y^  =  T 
X^  =  /®(6®) 

y2  _ 
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models  of  the  above  constraints,  the  sets  for  A2  and  respectively: 

=  {y®(6®)} 

Now,  in  the  least  model  of  the  first  constraints,  the  sets  for  X^  and  y2  contain 
terms  with  common  labels,  and  this  indicates  the  possibility  of  sharing  of 
structures  between  these  two  variables.  In  contrast,  in  the  least  model  of 
the  second  constraints,  the  sets  for  X^  and  ^2  do  not  contain  terms  with 
common  labels,  and  this  reflects  the  fact  that  there  can  be  no  sharing  of 
structures  between  these  two  variables. 

Note  that  we  have  defined  2'(T)  to  be  the  set  of  all  unlabeled  terms. 
Otherwise  we  could  not  obtain  any  useful  information  from  the  least  model 
since  it  would  indicate  possible  sharing  between  X^  and  3^^.  Thus  far  we 
have  not  considered  how  intersection  is  interpreted,  and  this  represents  the 
most  difl^cult  part  of  the  extension.  Intuitively,  intersection  operations  cor¬ 
responds  to  unification  operations  during  program  execution.  Now,  suppose 
that  a  variable  X  is  bound  to  the  term  f®{si)  and  consider  unifying  this 
with  f®(s2).  Although  this  unification  step  may  change  the  structure  of 
si,  it  cannot  change  the  binding  of  X.  That  is,  after  unification,  X  is  stiU 
bound  to  an  occurrence  of  /  with  label  d).  To  reflect  this,  we  first  define 
a  partial  function  A  for  combining  labeled  terms  such  that  the  leftmost  la¬ 
bel  is  preserved.  Specifically,  let  be  labeled  terms  such  that,  for 

some  function  symbol  /,  each  /,•  is  either  of  the  form  or 

•  •  •  jft.n)*  Define  that  ti  A  •  •  •  A  tm  is  .  .,Sm)  where  y  >  1  is 

the  smallest  index  such  that  tj  is  of  the  form  and  each 

sif  =  ti^k  A  •••  A  tm,ki  k  =  1-w.  If  j  does  not  exist,  then  ti  A  •••  A  is 
/(«!,. . .  ,Sm)-  Finally,  X{sei  n  •  -  •  D  sen)  can  be  defined  as  follows: 

{si  A---ASto  :si  €  €  I(sCm)). 

This  means  that  D  is  no  longer  a  commutative  operation,  and  reflects  the 
complexity  of  reasoning  about  implementation  behavior. 

Corresponding  to  the  changes  in  the  interpretation  of  labeled  set  con¬ 
straints,  the  transformations  of  the  set  constraint  algorithm  are  appro¬ 
priately  modified.  For  example,  the  transformation  for  simplifying  pro¬ 
jections  ignores  labels,  and  simplifies  both  X  2  f^i){f^{sl,...,Sn))  and 
X  2  /(7)^(/(^i>  •  •  •  »«n))  into  A'  2  «»•  The  transformation  for  simplifying  in- 
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tersection  preserves  the  leftmost  label.  For  example,  A'  D  TO /'^i(s)n 
is  simplihed  into  X  D  /'^‘(V/v)  where  N  =  Af(s)  UAf{t). 

We  remark  that  to  prove  such  an  analysis  correct,  we  would  need  to 
start  with  a  much  richer  notion  of  operational  semantics  than  has  been 
used  to  date.  In  particular,  we  would  need  an  operational  semantics  that 
characterizes  the  construction  of  term  structures  during  unification  in  logic 
programming  implementations.  We  also  note  that  a  number  of  details  have 
been  omitted  in  the  analysis  we  have  just  outlined.  For  example,  consider 
the  following  program,  and  its  (somewhat  simplified)  set  constraints: 

=  T 
y^  =  T 
x^  =  Tnw* 

=  Tn)V* 

x^  =  Tnyv* 

3^3  =  rn  vv^na 
V^  =  T 

Now,  because  of  the  aliasing  of  X  to  y,  there  is  sharing  of  structure  between 
X  and  y  after  the  execution  of  the  goal  <-p(A',y),r(y).  However  this  is 
not  directly  reflected  in  the  set  constraints  for  the  program.  This  issue  is 
addressed  through  the  use  of  labels  on  the  T  constants  introduced  for  the 
two  occurrences  of  >V.  More  generally,  extensions  are  needed  to  deal  with 
multiple  occurrences  of  variables  in  the  heads  of  rules. 


3.  ^  p(X,Y)\  r(Y)\ 

4.  p{W,W). 

5.  r(o). 


Chapter  10 


The  Unfolding  Engine 


In  order  to  define  a  simple,  intuitive  notion  of  approximation,  set  based  anal¬ 
ysis  ignores  all  inter-variable  dependencies.  In  terms  of  accuracy,  set  based 
analysis  offers  a  superior  ability  to  reason  about  data  structures.  The  disad¬ 
vantage  is  a  complete  inability  to  reason  about  inter-variable  dependencies, 
and  it  is  clear  that  many  kinds  of  analysis  benefit  from  information  about 
such  dependencies.  This  motivates  an  investigation  of  ways  to  re-incorporate 
restricted  forms  of  inter-variable  dependency  into  set  based  analysis.  Since 
abstract  interpretation  offers  a  limited  ability  to  reason  about  inter- variable 
dependencies,  it  is  natural  to  consider  combinations  of  set  based  analy¬ 
sis  techniques  (for  accurate  reasoning  about  data  structures)  and  abstract 
interpretation  (for  reasoning  about  inter- variable  dependencies).  In  this 
chapter,  we  present  a  hybrid  algorithm  for  analysis  of  logic  programs  that 
combines  these  techniques  in  way  that  enhances  the  effectiveness  of  both.  In 
particular,  we  describe  a  parameterized  algorithm,  called  the  unfolding  en¬ 
gine,  that  is  strictly  more  accurate  than  either  abstract  interpretation  or  set 
constraint  approaches.  Importantly,  the  underlying  approach  of  unfolding 
semantic  constraints  can  be  easily  adapted  to  other  programming  lamguages. 
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10.1  Motivation 


In  Section  9.1  we  discussed  the  use  of  set  constraints  for  mode  analysis. 
The  advantage  of  the  set  constraint  approach  lies  in  its  ability  to  accurately 
and  uniformly  reason  about  structure,  and  this  is  particular  important  for 
reasoning  about  partially  instantiated  terms.  However,  the  set  based  ap¬ 
proach  ignores  aU  inter-variable  dependencies,  and  such  dependencies  are 
important  for  reasoning  about  modes.  On  the  other  hand,  abstract  inter¬ 
pretation  approaches  to  mode  analysis  are  able  to  capture  some  information 
about  inter-variable  dependencies,  but  au'e  limited  in  their  ability  to  rea¬ 
soning  about  term  structure.  These  differences  are  illustrated  by  the  three 
programs  in  Figure  9.3  on  page  286.  For  the  first  of  these  programs,  the  set 
constraint  approach  performs  better  than  standard  abstract  interpretation 
approaches.  For  the  second,  abstract  interpretation  approaches  au'e  superior. 
The  third  program  in  the  figure  is 

^p(Y),riY,X). 

piU)^qiU). 

?(/(«,  n). 

r{f(Wl,W2),Wl). 

and  for  this  program,  neither  approach  is  able  to  determine  that  X  is  ground 
after  execution  of  the  goal  «-p(y),r(y,A').  In  this  chapter  we  describe  an 
approach  to  analysis  that  addresses  these  deficiencies  and  is  able  to  deter¬ 
mine  that  X  is  ground. 

Recall  that  abstract  interpretation  approaches  to  analysis  can  be  viewed 
as  consisting  of  two  main  components:  an  abstract  domain  and  an  imple¬ 
mentation  of  an  iterative  fixed  point  computation.  Now,  the  key  differences 
between  these  algorithms  usually  pertain  to  the  design  of  the  abstract  do¬ 
main  and  its  associated  algorithms.  Differences  between  the  fixed  point 
computations  are  relatively  minor  (often  algorithms  differ  because  ad  hoc 
approximations  are  introduced  for  efficiency  reasons).  For  this  reason  it  is 
possible  to  speak  of  an  idealized  generic  algorithm  or  "engine”  that  encom- 
paisses  the  essence  of  the  fixed  point  computations. 

In  this  chapter  we  develop  a  new  engine  for  logic  program  analysis.  As  in¬ 
stances  of  this  engine,  we  obtain  analysis  algorithms  that  combine  the  ability 
of  set  based  analysis  to  reason  about  structural  information  with  the  ability 
of  abstract  interpretation  to  reason  about  dependencies  between  variables. 
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The  new  engine  is  based  upon  the  use  of  unfold  transformations.  While  the 
standard  engine  iterates  over  fixed  semantic  equations,  this  unfolding  engine 
iterates  over  dynamically  chan^g  equations.  The  main  result  shows  that 
the  unfolding  engine  is  uniformly  more  accurate  than  the  standard  en^e 
in  the  sense  that,  given  an  abstract  domain,  the  output  of  the  unfolding 
engine,  for  any  program,  is  more  accurate  than  that  of  the  standard  engine. 
We  remark  that  this  chapter  contains  essentially  the  same  material  as  [26]. 


10.2  Collecting  Semantics 


Most  program  analysis  start  with  what  is  essentially  a  representation  of 
an  approximation  of  a  program’s  collecting  semantics.  This  is  essentially 
obtained  by  starting  with  semantic  equations  that  characterize  the  collec¬ 
tion  semantics,  and  then  replacing  exact  operations  in  these  equations  by 
approximate  operations.  The  analysis  then  proceeds  by  using  these  approx¬ 
imate  semantic  equations.  In  contrast,  the  unfolding  engine  starts  with  a 
representation  of  the  collecting  semantics  that  is  exact,  and  changes  the 
representation  during  its  execution. 

We  begin  by  describing  the  initial  representation  of  the  collecting  seman¬ 
tics.  Again,  the  representation  employs  constraints  rather  than  equations. 
These  constraints  are  similar  to  the  environment  constraints  described  in 
Chapter  4.  However  the  environment  constraints  provide  a  characterization 
of  the  collecting  semantics  in  terms  of  mappings  from  program  variables  to 
values  (ground  terms),  and  so  they  are  not  well  suitable  to  studying  proper¬ 
ties  such  as  the  instantiation  level  of  variables.  Moreover,  the  environment 
constraints  are  based  on  a  very  abstract  notion  of  operational  model  that 
does  not  directly  incorporate  amy  notion  of  equation  solving  or  unification. 
Hence  properties  specific  to  the  equation  solving  process  cannot  be  inferred. 

The  new  representation  of  coUecting  semantics  commits  to  a  specific  op¬ 
erational  model  based  on  unification  (more  generally,  it  would  be  appropriate 
to  employ  operational  models  parameterized  by  some  notion  of  constraint 
solving).  We  begin  by  consider  constraint  corresponding  to  bottom-up  ex¬ 
ecution.  Consider  Figure  10.1,  wMch  contains  the  standard  logic  program 
for  appending  two  lists.  Now,  to  construct  constraints  to  capture  the  com¬ 
puted  answer  substitutions  obtained  from  executing  the  body  of  each  rule 
in  this  program,  first  introduce  set  variables  and  to  represent  the  sets 
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1.  app(nil,W,W). 

2.  app(X.Xs, Y,X.Zs)*~app(Xs, Y, Zs). 

Figure  10.1:  Append  Program 


of  answer  substitutions  for  rules  1  and  2  respectively.  Then  construct  the 
following  constraints: 


3 

^2  3 
^2  3 


|myu  ^pp{Xs,  y,  Zs)  = 

^^gu(app{Xs,Y,Zs)  =  I  e(app(X.Xs,  y,  X.Zs))  |  J  :  ^  e  1^ 


where  Si  and  $2  denote  the  sets  of  variables  in  rules  1  and  2  respectively.  The 
function  mgu  denotes  the  unification  algorithm  at  hand  (we  shall  assume  a 
fixed  unification  algorithm  throughout  this  chapter),  which  maps  a  set  of 
equations  to  its  most  general  unifier.  If  the  set  of  equations  is  empty,  then  the 
identity  substitution  is  returned.  Where  5  is  a  set  of  program  variables,  the 
function  \s  restricts  the  domain  of  a  set  of  substitutions  to  the  variables  5. 
For  example,  if  $  maps  X  into  6,  y  into  e  and  maps  all  other  variables  Z  into 
Z,  then  consists  of  the  single  substitution  that  maps  X  into  b  and 

is  the  identity  on  all  other  variables.  Finally,  where  t  is  a  term  (constructed 
from  predicates  and  function  symbols),  |t|5  denotes  the  renaming  of  the 
variables  in  t  into  new  variables  so  that  var(|t|v)  n  5  =  {}. 


A  model  of  such  constraints  maps  both  '9^  and  into  a  set  of  substitu¬ 
tions,  such  that  each  constraint  is  satisfied.  Models  are  ordered  component¬ 
wise,  so  that  Modeli  <  Model2  iff  Modeli{9)  C  Model2{9)  for  each  variable 
9.  As  an  example. 


1^1  {{W^W)} 


f 

Xs>-*nil 

'Xs^X'.nil 

'Xs*^X'.X".nil 

Y>-^W 

Y^W 

> 

Y^-*W 

,  •  ,  . 

Zs^W 

Zs^X'.W 

Zs>-*X'.X".W 

4 

where  X\X'\ ...  are  new  variables,  is  a  model  of  the  above  constraints  for 
the  append  program.  It  is  in  fact  the  least  such  model  (up  to  renaming  of 
new  variables).  It  coincides  exactly  with  the  program’s  meaning  in  the  sense 
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that  the  set  for  9'  consists  of  the  computed  answer  substitutions  obtained 
when  the  body  of  the  rule  is  executed  as  a  goal. 

More  formally,  let  P  be  a  logic  program  and  introduce  variables  for 
each  rule  label  in  P.  Recall  that  if  at  is  a  rule  label,  then  var(a)  denotes  the 
set  of  variables  in  the  rule  with  label  a.  The  constraints  for  P  are  defined 
as  follows. 


Definition  10  (Bottom-up  Semantic  Constraints)  For  each  rule  € 

P  with  body  Af  * , . . . ,  i4®",  construct  the  constraints 


f-*- =1 U.,'1 

mgu 

... 

:  e  ijr A  A  •  •  •  A  • 

L 

4 

where  the  t  >  1,  range  over  rule  labels  such  that  Bf'  is  a  head  atom  in 
P  and  At  and  Bi  are  compatible.  [] 


Before  providing  more  examples,  we  first  introduce  an  abbreviation  for  the 
right  hand  sides  of  these  constraints.  Specifically,  we  shall  abbreviate  ex¬ 
pressions  of  the  form 


9i  € 

i  =  l..n 


^  var(a) 


i>y  ([^1  =  I  •  •  •  >  ['^n  =  •  Such  an  expression  is  called  a  cluster. 

For  example,  the  constraints  for  the  append  program  (Figure  10.1)  can  be 
written  as  follows.  Note  that  the  right  hand  side  of  the  first  constraint  is  an 
empty  cluster. 


2  ()i 

2  {[app{XStYtZs)  =  app{nil,W,W)],)j 
2  {[app(Xs,Y,Zs)  =  app{X.Xs,Y,X.Zs)]j\ 

In  general,  expressions  such  as  [s  =  contained  in  clusters  are  called 
groups,  and  the  individual  equations  contained  in  groups  are  called  group 
equations.  (Note  that  group  equations  are  not  pure  equations;  thus  [s  =  t]a 
is  different  from  [t  =  s]a-)  The  subscript  a  indicates  the  rule  a,  and  is  called 
the  dependency  of  the  group. 
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1.  p(f(X))^p(X),qmriY). 

2.  p(6). 

3.  qiW). 

4.  r(c). 

2  {\KX)=  P(/(X)% ,  l,(X)  =  ,(H0)3  ,  wn  =  r(c)U 

2  (  W^)  =  p(»)l2 .  b(A^)  =  Sfw'jk .  Wi')  =  '■W)4 ) , 

2  0, 

2  ()3 
»*  2  ()4 

Figure  10.2:  Example  Clusters  Containing  More  Than  One  Group 

Figure  10.2  contains  another  program  and  its  semantic  constraints,  il¬ 
lustrating  clusters  that  contain  more  than  one  group. 

In  the  least  model,  is  {(X  •-»  6,y  c),  (X  ^  /(b), V  c),  (X  ^ 

We  conclude  by  describing  the  semantic  constraints  for  top-down  left-to- 
right  execution.  Again,  let  P  be  a  logic  program  and  introduce  variables 
for  each  label  a  in  P.  Recall  that  if  a  is  a  rule  label,  then  var(a)  denotes 
the  set  of  variables  in  the  rule  with  label  a,  and  if  a  is  a  body  atom  label, 
then  t;ar(a)  denotes  the  set  of  variables  in  the  rule  that  contains  the  body 
atom  with  label  a.  Also  recall  that  goals  are  treated  as  rules  without  heads. 
Using  the  abbreviated  notation  for  clusters,  the  top-down  constraints  for  P 
can  be  defined  as  follows. 
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Definition  20  (Top-Down  Semantic  Constraints)  For  each  rule  R°  G 
P  vnth  body  ^4^* ,  •  •  •  > (*f  exists),  €Cp  contains  the  con¬ 
straints 

2  ([Ao€Bok)^ 

2  ([^€Bok,[Ai€Bik)^ 

2  ([-^  €  -SokJ-^i  €  5ik,...,[i4„_i  6 

’®'“  3  ([>io  €  5ok>[-Ai  €  fil [j4„  €  Bn]/?„)^ 

where  (3o  ranges  over  body  atom  labels  such  that  is  a  body  atom  in  P 
and  Aq  and  Bq  are  compatible,  and  the  t  >  1,  range  over  rule  labels  such 
that  Bf'  is  a  head  atom  in  P  and  A,  and  B,  are  compatible.  [] 


If  the  rule  A  is  a  goal,  then  Aq  does  not  exist,  and  the  group  [y4o  €  ^o]/3b 
simply  deleted  from  each  expression.  Figure  10.3  illustrates  the  construction 
of  the  top-down  constraints  using  the  append  program. 

Consider  the  third  rule  of  the  program.  Recalling  the  notation  for  pro¬ 
gram  points,  point  4  denotes  the  point  just  before  app(Xs,Y,Zs)  is  called, 
and  point  5  denotes  the  point  after  the  body  of  the  third  rule  has  com¬ 
pleted  execution.  There  are  six  constraints  corresponding  to  this  rule.  Two 
of  them  define  the  set  of  substitutions  encountered  at  point  4  (and  corre¬ 
spond  the  possibility  that  this  rule  can  be  called  from  two  places  in  the 
program),  and  four  of  them  define  the  set  of  substitutions  at  point  5  (and 
correspond  to  the  four  combinations  of  possible  calls  to  this  rule  and  re¬ 
turns  from  app(Xs,Y,Zs)).  Note  that,  strictly  speaking,  the  second  and 
fourth  constraints  in  this  collection  are  omitted  because  app{nil,  W,  W)  and 
app{b.nil,c.nil,V)  are  not  compatible. 


10.3  Unfolding  Semantic  Equations 

A  key  idea  behind  our  engine  is  the  substitution  of  one  cluster  into  an¬ 
other.  Such  a  substitution  step  may  generate  a  new  cluster.  The  engine  is 
essentially  an  exhaustive  application  of  this  step.  We  now  elaborate. 
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2.  *—app{h.nil,c.nil,VY 

3.  app{nil,W,W). 

5.  app{X.Xs,  Y,  X.Zs)*-app(Xsy  Y,  Zs)*. 


^2  2 
^3  2 
2 

9*  2 

^4  2 
^5  2 

^5  2 

^5  2 

^5  2 


0. 

(^[app{b.nU,c.nil,V)  =  app(n*7,  W,W)]3)^ 

(^[appib.nilyC.nilyV)  =  app{X.Xs,YyX.Zs)]^) 

{^[app{nil,W,W)  =  app(b.nil,c.nil,V)\^  ^ 

([app{nil,W,W)  =  appiXsyYyZs)],)^ 

([app{X.Xs,YyX.Zs)  =  appib.nil,c.nil,V)]i)^ 

([app{X.XsyY,X.Zs)  =  app{Xs,YyZs)]^')^ 

f  [app{X.XsyYyX.Zs)  =  app{b.nilyC.nil,V)\  \ 

1,  [app{XsyYyZs)^app{nilyW,W)]^  ^ 

/  [app(X.Xs,Y,X.Zs)  =  app{b.nil,c.nil,V)]-i  \ 

V  [app{Xs,YyZs)  =  appiX.Xs,Y,X.Zs)]^ 

f  [app{X.Xs,Y,X.Zs)  =  app(Xs,Y,  Zs)]^  ^ 

[appiX^y  Yy  Zs)  =  appinily  Wy  W)]^  ^3 

/  [app(X.XsyYyX.Zs)  =  app(Xs,YyZs)]^  \ 

V  [app(XsyYyZs)  =  app(X.XSyYyX.Zs)]^ 


Figure  10.3:  Top-Down  Semantics  Constraints  for  Append 
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Consider  a  group  G  whose  dependency  a  is  a  fact,  that  is,  the  rule  with 
label  a  has  no  body.  Since  the  equation  for  rule  a  is  of  the  form  D  (), 
any  model  of  the  semantic  constraints  must  map  to  the  singleton  set 
consisting  of  the  identity  substitution.  Hence  Q  has  a  fixed  interpretation  in 
any  model,  and  can  be  thought  of  as  representing  a  fixed  set  of  substitutions. 
Call  such  groups  G  terminal.  A  group  is  in  substitution  form  if  the  left  hand 
side  of  each  equation  therein  is  a  variable;  a  cluster  is  in  substitution  form 
if  each  of  its  groups  is  in  substitution  form. 


We  illustrate  informally  how  such  clusters  are  used  as  substitutions  by 
returning  to  the  bottom-up  constraints  for  the  append  program  (Figure 
10.1).  By  rewriting  equations  such  as  p(si, . , .  ,Sn)  =  p(ti, , . . , tn)  into  = 
=  tn,  these  constraints  may  be  simplified  as  follows. 


2  ()i 


^2  2 

ifl2  2 


Denote  the  two  clusters  in  the  constraints  for  by  cli  and  c/2  respec¬ 
tively.  c/i  contains  a  terminal  group  Gi  that  represents  the  substitution 
[Xs>-^nil,  Y>-*W,  Zst-^W].  c/2  contains  a  group  G2  that  is  non-terminal. 
By  substituting  c/i  into  c/2,  we  obt^  the  new  cluster: 


Xs  =  X.nil  \ 

Y  =  W  (10.40) 

U  J  J2 

and  this  can  be  used  to  construct  a  new  constraint  for  9'^: 


Xs  =  X.ni/]  \ 

Y  =  W  (10.41) 

Zs  =  X.W  J 

J 1/  2 

The  correctness  of  a  substitution  step  follows  from  the  fact  that  it  is  essen¬ 
tially  an  "unfolding”  operation  using  the  definition  of  Xs,  Y,  and  Zs.  That 
is,  the  new  expression  (10.40)  is  subsumed  by  c/2  in  the  following  sense: 
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in  all  models  J  of  the  original  constraints,  2((10.40))  C  X(ci2).  Hence  the 
addition  of  the  constraint  (10.41)  does  not  change  the  least  model  of  the 
constraints. 

However,  the  new  equation  is  more  explicit  in  the  sense  that  it  displays 
more  elements  of  the  set  4'^  (as  terminal  groups).  Clearly  the  process  of 
unfolding  of  constraints  can  be  repeated  indefinitely  to  obtain  new  clusters 
([A's  =  X,X^.nil,Y  =  W,Zs  =  X.X'.W]^)^,  ...,  etc.  Note  that  each  such 
cluster  represents  an  answer  substitution  for  the  body  of  the  second  rule 
of  the  append  program.  To  obtain  a  terminating  analysis  algorithm,  this 
process  of  adding  new  clusters  must  be  curtailed  by  using  a  notion  of  ap¬ 
proximation.  Consider  approximating  the  new  groups  constructed  during 
the  substitution  step.  Each  such  group  is  terminal  and  hence  defines  a  fixed 
set  of  substitutions.  Hence  any  traditional  means  for  approximating  sub¬ 
stitution  sets  can  be  employed  (see,  for  example,  [36]).  We  present  one 
formalization  as  follows. 


Definition  21  An  abstract  domain  V  is  a  set  of  abstract  formulas  together 
with  (a)  an  abstraction  function  abs  that  maps  a  set  of  variables  and  a  set 
of  substitutions  into  an  abstract  formula,  and  (b)  a  concretization  function 
CONC  that  maps  an  abstract  formula  into  a  set  of  substitutions  such  that 
CONC(abs(5,  0))  2  0  for  all  0  and  finite  sets  S  of  variables.  (Typi¬ 
cally,  some  further  algebraic  conditions  are  specified,  but  these  will  not 
concern  us  here.)  The  definition  of  abs  and  coNC  induces  a  notion  of 
abstract  unification^ 

V-unify{S,s,t,A)  ABs(5,|m5ru(s  =  |^(0lvor(.))  :  ^  €  CONc(.4)}) 
where  s  and  t  are  terms,  S  is  a  set  of  program  variables  and  A  €V.  [] 

As  an  example,  consider  the  propositional  formula  domain  used  by  Mar- 
riot  and  Sondergaard  to  capture  relationships  between  groundness  of  vari¬ 
ables  [44].  Specifically,  let  "^prop  consist  of  aU  propositional  formulas  over 
program  variables.  For  example,  X,  X  AY,  X  W  Y,  X  4=»  Y  are  three 
formulas  in  Vprop-  In  essence,  each  formula  A  denotes  the  set  of  substitu¬ 
tions  that  satisfies  A.  If  A  is  a  variable,  say  X,  then  B  satisfies  A  if  B{X)  is 

*For  pragmatic  reaaona,  (ome  analysi*  algorithms  do  not  implement  “D-unify  exactly,  but  in¬ 
stead  use  a  conservative  approximation. 
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a  ground  term.  The  definition  of  satisfies  is  extended  to  non- variable  formu¬ 
las  using  the  interpretation  of  the  propositional  connectives.  For  example, 
the  proposition  X  V  V  is  true  ia  0  if  B  grounds  either  X  or  Y.  Similarly 
X  A  y  is  true  if  6  grounds  both  X  and  Y,  and  X  Y  is  true  if  9  either 
grounds  both  X  and  Y  or  grounds  neither  of  them.  CONC  and  abs  can  now 
be  defined  as  follows: 


CONC(>l) 

abs(5,0) 


{9x6  satisfies  A} 

drf  /  the  strongest  proposition  A  such  that 
~  y  var{A)  C  S  and  A  is  true  for  aU  ^  €  0 


For  instance,  mgu{X  =  f(a,Y))  can  be  approximated  with  respect  to  vari¬ 
ables  {X,Y}  by  the  formula  X  •<=>  Y;  similarly  mgu(Xs  =  X.nil,Y  = 
W.,Zs  =  X.W)  with  respect  to  {XStY,Zs}  can  be  approximated  by  {Xs  A 
y  ^  Zs). 


Returning  to  the  bottom-up  semantic  constraints  for  append,  we  recall 
that  the  unfolding  process  may  generate  an  unbounded  number  of  new  clus¬ 
ters.  This  can  be  curtailed  using  the  abstract  domain  2>prop  in  a  straightfor¬ 
ward  way:  approximate  any  new  groups  Q  constructed  during  the  unfolding 
step  using  ABS.  Hence,  instead  of  producing  the  constraint  (10.41),  the  first 
unfolding  step  now  produces  the  constraint 

Note,  however,  that  this  approximation  of  the  unfolding  step  loses  a 
significant  amount  of  structural  information.  In  fact,  when  a  group  Qi  is 
substituted  into  a  group  Q2,  the  resulting  cluster  loses  all  information  about 
these  two  groups  except  for  that  contained  in  the  ^-approximation.  A 
key  observation  of  this  paper  is  that  the  group  Q2  can  in  fact  be  retained. 
Specifically,  we  define  that  the  substitution  of  Qi  into  Q21  which  originally 
yielded  a  single  group  [As  A  Y  Zs],  now  yields  two  groups: 

■  Xs  =  X.Xs  1® 

y  =  y 

Zs  —  X.Zs  ^ 

In  other  words,  the  group  Q2  remains  after  the  substitution,  and  is  marked  as 
a  residual  group  (indicated  with  the  superscript  with  ®).  It  is  these  residual 
groups  that  provide  our  engine  with  the  ability  to  reason  about  structures. 
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10.4  The  Engine 


We  begin  with  some  definitions.  We  write  J  to  denote  a  sequence  of  terms 
Si , . . . ,  s„ .  Similarly,  if  5  denotes  si , . . . ,  Sn  and  i  denotes  ti  we  write 

5  =  t  to  denote  the  sequence  of  equations  si  =  <i,...,s„  =  An  exact 
group  is  of  the  form  [si  =  ti, . . .  ,s„  =  <n]a  where  the  s,-  and  are  program 
terms  (constructed  from  function  and  predicate  symbols  and  program  vari¬ 
ables).  We  shall  frequently  write  exact  groups  using  the  sequence  notation 
as  [s  =  An  exact  group  is  terminal  if  the  rule  with  label  a  is  a  fact. 
Some  exact  groups  may  be  marked  residual  during  execution  of  the  engine. 
An  approximate  group  is  defined  with  respect  to  a  given  abstract  domain 
2?;  it  is  simply  of  the  form  [A]a  where  A  €  V.  A  group  is  an  exact  or  ap¬ 
proximate  group.  A  cluster  is  of  the  form  (^i, . .  .,Qn)a  where  each  Qi  is  a 
group.  In  the  previous  section,  semantic  constraints  were  constructed  from 
a  program,  and  these  form  the  starting  point  of  the  unfolding  engine.  At 
all  times,  corresponding  to  each  rule  there  is  a  collection  of  constraints 
of  the  form  4'“  D  cl  where  cl  is  a  cluster  and  is  the  distinguished  set 
variable  corresponding  to  rule  label  a. 

Let  Q  be  the  exact  group  [J  =  ijo.  S  is  in  substitution  form  if  s  is 
a  sequence  of  variables.  Now,  suppose  that  Q  is  in  substitution  form  and 
let  Ai, . . .  ,Xm  be  a  listing  of  the  variables  in  var(5).  Now  define  subs(ff) 
to  be  the  set  of  all  substitutions  of  the  form  [Aii-^ui, . . .  ,A’TOt-»Um]  such 
that  Xi  =  Ui  appears  in  t  =  l..m.  In  essence,  subs{Q)  consists  of  all 
substitutions  defined  by  taking  subsets  of  the  equations  in  Q  such  that  each 
subset  contains  exactly  one  equation  for  each  variable  Xi,  i  =  l..m.  A 
cluster  is  in  substitution  form  if  each  group  therein  is  either  an  approximate 
group,  or  a  terminal  group  in  substitution  form,  or  is  marked  as  a  residual 
group.  An  exact  group  can  be  made  into  substitution  form  by  deleting 
certain  equations: 

l^la  [X  —  t:X  =  t^Q  and  X  is  a  variable  ]j,. 

This  operator  will  be  used  exclusively  in  the  construction  of  residual  groups. 

The  basic  operation  of  the  unfolding  engine  is  the  notion  of  substituting 
a  cluster  in  substitution  form  into  a  group.  What  results  is  a  new  cluster. 
First  define  the  composition  of  an  exact  group  with  an  approximate  group 
as  follows: 
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[S  =  i]o,  o  {A]0  =  [Z>-ttnt/y(5,  s,  i,  A)]p 

where  5  denotes  and  V-unify{S,s,i,A)  denotes  the  exten¬ 

sion  of  V-unify  to  sequences: 

V-unify{S,  3,  t,  A)  ABS  (s,  jmju  ^5  =  ||  0(i)  ^  G  C0NC(>1)|^ . 

Here  |^(t)|var(;)  renames  the  sequence  0{i)  so  that  it  has  no  variables  in 
common  with  var(l). 

Similarly,  the  composition  of  an  exact  group  with  an  exact  group  in 
substitution  form  can  be  defined: 

[€]c  0[X  =  u]p  [s-te:  is  =  t)e£  Sindee  subs  =|“|U]^ 

where  S  denotes  the  variables  in  the  right  hand  sides  of  the  equations  in  £, 
that  is  5  =  U(«=t)6£  Int^tively,  the  renaming  operation  lufsin  this 

definition  is  required  because  there  is  a  renaming  operation  implicit  in  the 
interpretation  of  both  of  the  exact  groups  [£]«  and  [X  =  u]p,  but  there  is 
only  one  such  operation  in  the  resulting  group.  In  particular,  without  this 
renaming,  [X  =  fiW,Y)]o[Y  =  W],  would  incorrectly  yield  [X  =  fiW,W)]. 

We  also  define  an  approximate  notion  of  the  composition  of  am  exact 
group  with  an  exact  group  in  substitution  form.  Specifically,  where  [fi]a  is 
an  exact  group  and  is  an  exact  group  in  substitution  form,  define  that 
the  approximate  composition  [fi]®  o  consist  of  two  groups.  The  first 
group  is  simply  the  P-approximation  of  [5i]a  °  \£^p- 


[abs  (5,  mgu  ([fi]a  o  [^a]/?))]^ 


where  is  s  =  f  and  5  is  var(5).  The  second  group  is  a  subset  of  o 
[^a]/?  selected  by  the  function  select.  Specifically,  if  £2  is  X  =  u,  then 
select  ([£^i]a)  [^3]^)  is  defined  to  be 


s  =  t6  : 


(5  =  t)  €  £i  and  9  €  subs  =  |«|5) 
and  either  t  is  a  variable  or  \t9\  =  |t| 


/3 


where  5  denotes  the  variables  in  the  right  hamd  sides  of  the  equations  in  £i , 
and  |t|  is  the  number  of  symbols  in  the  term  t.  Intuitively,  select  chooses 
equations  from  [fi]a  0  [£^2]/?  in  such  a  way  that  the  size  of  any  term  therein 
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does  not  exceed  the  size  of  the  terms  in  the  original  equations  [£'i]a  and 
For  example,  if  and  [S^p  are  [X  =  Wi,Y  =  f(W\),Z  =  f{W2)]a  and 
[IFi  =  s(o),W^2  =  V]p  respectively,  then  o  [E^p  is  [X  =  g{a),Y  = 
f{g{a)),Z  =  f{V)\p.  However  se/cct([A]a»  [^2]/?)  will  contain  the  first  and 
third  equations,  but  not  the  second  because  /(5(a))  is  too  big.  As  we  see 
later,  it  is  possible  to  consider  other  definitions  of  select. 

We  are  now  in  a  position  to  define  the  main  step  of  our  engine. 

Definition  22  (Unfolding  Step)  Let  cli  be  a  cluster  containing  a  non¬ 
terminal  group  Q.  Let  c/2  he  a  cluster  in  substitution  form.  Construct  a  new 
cluster  by  replacing  Q  in  cli  by  all  of  the  follotoing  groups: 

•  G,  and  this  group  is  marked  as  residual; 

•  Go  [A],  for  all  approximate  groups  [.4]  €  c/2,  and 

•  Go [^]a.  for  all  exact  groups  [5]a  €  c/2. 

We  say  that  this  new  cluster  is  the  result  of  substituting  c/2  into  cli  at  G. 

D 

For  example,  let  cli  and  c/2  be  the  following  two  clusters  (where,  for  sim¬ 
plicity,  we  have  omitted  the  subscripts  on  clusters  and  groups): 


/ 

'  Y  =  U 

([x  =  /(y)]) 

Y  =  p(&) 
Z  =  c 

< 

■[y  =  y]' 

The  substitution  of  c/2  into  cli  at  the  group  [A  =  f{Y)],  using  the  abstract 
domain  Pprop>  results  in  the  following  cluster: 

([x  =  m  ]®\ 

[jr  =  /((7)] 


\x  =  HV)\ 

\  [  (ro«  ]  , 
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The  first  group  [X  =  /(V)]  in  this  cluster  is  residual.  The  second  group 
[X  =  f{U)]  is  obtained  from  select([X  =  f{Y)]  o[Y=U,Y  =  g{b),Z  =  c]). 
(Note  that  the  equation  X  =  f(g(b))  is  omitted  by  select.)  The  third 
group  [X]  is  approximate,  and  is  obtained  from  abs({A’},  mgu{[X  =  f{Y)]  o 
\Y  =  U,Y  =■  g{b),Z  =  c])).  It  contains  the  proposition  that  X  is  ground. 
The  fourth  group  [Jt  =  /(V)]  is  obtained  from  select{[X  =  f{Y)]  o[Y  = 
V]).  Finally,  the  fifth  group  [true\  is  approximate  and  is  obtained  from 
ABS({X},msti([X  =  /(y)]  O  [Y  =  V])). 

We  now  define  how  unfolding  steps  are  applied  to  a  collection  of  semantic 
constraints.  Let  C  be  a  collection  of  semantic  constraints  and  consider  the 
following  transformation  step. 

Transformation  19  (Unfolding)  IfC  contains  the  constraints  3  c/i 
and  2  c/j  such  that  cli  contains  a  group  and  c/2  is  in  substitution 
form,  then  let  dz  be  the  result  of  substituting  clz  into  cli  at  the  group  [S]^, 
and  construct  the  constraint  D  c/3.  If  this  constraint  does  not  appear  in 
C,  then  the  unfolding  step  is  said  to  be  effective,  and  the  constraint  is  added 
toC. 

The  complete  engine  is  presented  in  figure  10.4.  It  is  essentially  an  ex¬ 
haustive  application  of  the  unfolding  step  just  defined.  Note  that  before  an 
unfolding  step  is  performed,  it  is  necessary  to  perform  some  basic  simplifi¬ 
cation  steps.  These  straightforward  simplifications^  are: 

•  replace  a  group  equation  of  the  form  /(si,...,5„)  =  /(/i,...,/n)  by 

Si  —  t\, . . . ,  Sn  =  tn] 

•  replace  any  group  of  the  form  [si  =  /i,...,s„  =  /„]>  where  one  of 

the  equations  is  of  the  form  =  W  and  W  is  a  head- 

only  variable,  by  [si  =  t\0,...,Sn  =  tn^]  where  0  is  the  substitution 
[W^-*f{Wi ,..., Wife)]  and  the  variables  Wi,...,Wk  are  new  variables. 
(Intuitively,  a  group  such  as  [f{X)  =  W,Y  =  W]  does  define  a  substi¬ 
tution,  but  is  not  in  substitution  form.  This  simplification  step  reme¬ 
dies  the  problem  by  transforming  it  into  [/(A”)  =  f{Wi),Y  =  /(Wi)], 
which  in  turn  can  be  simplified  into  [X  =  Wi,Y  =  /(Wi)].) 

^We  also  observe  that  we  can  eflectively  ignore  any  constraint  of  the  fonn  X  0  cl  such  that  cl 
contains  either  (a)  a  group  equation  of  the  form  /(s)  =  g(i),  /  ^g,<»e  (b)  two  group  equations  of 
the  form  X  s  /(!)  and  X  =  g{t),  f  ^  g. 
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input  an  arbitrary  program  P; 

obtain  the  semantic  constraints  associated  with  P; 

repeat 

exhaustively  apply  the  basic  simpliiication  steps; 
apply  an  effective  unfolding  step  if  possible; 
until  the  constraints  do  not  change; 
delete  ail  residual  groups^; 

output  all  constraints  involving  substitution  form  clusters; 


Figure  10.4:  The  Unfolding  Engine 


1.  -p(/(X)),  q(X,Y). 

2.  p(U)^r(U). 

3.  q(g(Wl,W2),Wl). 

4.  r(g(c,W3)). 

5.  r(/(F))^r(V). 


mx)  =  uh 

-  \[X  =  giWl,W2),Y  =  Wl]3)^ 

9^  2  {[U  =  g{c,WZ)Uy^^ 

3  ([If  =  fiV)],)^^ 

2  Oa 

9*  2  ()4 

2  {[V  =  /(^^)]s)['^ 


Figure  10.5:  Example  to  Illustrate  Execution  of  Unfolding  Engine 


When  no  more  effective  unfolding  steps  can  be  performed,  all  clusters  in 
substitution  form  are  output  after  first  deleting  any  residual  groups.  Such 
clusters  represent  fixed  sets  of  substitutions  because  the  groups  they  contain 
are  independent  of  the  values  given  to  the  variables.  The  importamt  point 
here  is  that  the  output  is  an  explicit  representation  of  a  set  of  substitutions 
for  each  variable  '9'°'.  A  post-processing  phase  may  then  be  applied  to  extract 
the  specific  information  sought. 

Consider  the  program  and  its  (simplified)  semantic  constraints  shown 
in  Figure  10.5.  The  labels  (a),  (6),  . . .  are  used  to  identify  clusters  in  the 

^The*e  groups  in  fact  do  contain  some  structural  infonnatioa.  We  address  the  issue  of  extract¬ 
ing  this  information  in  the  next  section.  For  now,  we  shall  simply  ign<»e  them. 
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following  discussion.  Two  of  the  dusters  in  the  constraints  are  in  substi¬ 
tution  form,  namely  (6)  and  (d).  Using  these  dusters,  the  following  three 
unfolding  steps  can  be  performed:  duster  (6)  can  be  substituted  into  (a) 
and  duster  (d)  can  be  substituted  into  (c)  and  (e).  These  three  unfolding 
steps  respectively  result  in  the  following  new  constraints. 


,  /  [f(X)  =  gic,WZ)U  \ 

-  =  9(W1,W2),  Y  =  Wlh ) 


^2  3  ([U 

= 

<„ 

-  V 

[true]4  j 

K 

^5  3  ([V 

= nv)lt'' 

... 

-  \ 

[tnie]4  } 

K 

(/) 

1 


In  the  first  of  these  unfolding  steps,  there  is  no  residual  group  in  (/)  because 
[£]  =  (]  when  [£]  =  [/(X)  =  U].  The  new  constraint  involving  (/)  is  vacuous 
because  it  contains  the  conflicting  group  equation  f{X)  —  g{c,WZ). 


Next,  dusters  (g)  and  (h)  can  be  used  in  unfolding  steps.  We  omit 
the  details  for  (h)  since  these  steps  do  not  lead  to  any  new  constraints. 
Substituting  (y)  into  (a)  yields  the  constraint 

t  lf(.X)  =  /(V))5 
2  I  (froeU 

V  [X=5(W1,W2),  y  =  wi]3 

which  is  subsequently  simplified  into 

( 

’8'*  2  I  [fn«€]4 

V  [X=g{Wl,W2),Y=:Wl]3 

Now,  there  are  two  possible  substitutions  into  this  new  duster:  (d)  into 
(j)  and  (h)  into  (j).  These  two  substitution  steps  respectively  yield 
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2 


2 


<  [X 
/ 


\{X 


[X  =  vf 
[X  =  g(e,W3)U 

ltnic]4 

g{WhW2),  y  = 


K(k) 

Wlhj 


[X  =  vf 

(x  =  m\, 

[iruc]4 

g{yV\,W2\Y  = 


yo 

Wlhj 


The  second  constraint  is  vacuous  because  it  contains  the  conflicting  equa¬ 
tions  X  =  /(V)  and  X  =  g{W\^W2).  At  this  point  the  engine  termi¬ 
nates,  and  outputs  the  constraints  involving  the  following  clusters:  (b), 
(d),  (g),  (h),  and  (k).  The  only  output  constraint  for  '9^  is  that  involv¬ 
ing  cluster  {k).  This  cluster  represents  the  flxed  collection  of  substitutions 
given  by  mgu(X  =  g(c,W3),X  =  g{Wl,W2),Y  =  Wl)  conjoined  with 
true.  This  in  turn  can  be  simplified  and  projected  onto  {X,Y}  to  obtain 
{(A'i-+^(c,t),yt->c)  :  t  is  any  term}  from  which  we  can  deduce  that  Y  is 
ground. 


To  see  the  need  for  residual  groups  in  the  above  example,  consider  re¬ 
moving  the  residual  group  in  (g).  Then  the  group  [f{X)  =  /(V’)]5  would 
not  be  in  (t),  and  this  implies  that  the  group  [X  =  Vjs  would  not  be  in  (j), 
and  this  implies  that  the  group  [X  =  j7(c,iy3)]4  would  not  be  in  (k).  Hence 
it  cannot  be  inferred  that  Y  is  ground. 


One  could  raise  the  question  as  to  why  the  group  in  (c),  which  gave  rise 
to  the  residual  group  in  (g),  could  not  itself  be  used  for  substitution  into 
(o).  (That  is,  consider  substituting  (c)  into  (a).)  Allowing  this  implies  that 
groups  that  are  neither  terminal  nor  residual  are  allowed  to  be  substituted 
into  other  dusters.  Hence  many  more  unfolding  steps  are  permitted  in 
general.  Moreover,  this  results  in  substantial  loss  of  information.  To  see 
this,  consider  the  program  and  (simplified)  semantics  constrsunts  in  Figure 
10.6.  The  engine  dictates  that  duster  (o)  is  first  substituted  into  duster  (n) 
to  obtain  the  constraint: 

3  /■  (^1 = /(Vi).-?3 = /m)f  V” 

-  I, 

Cluster  (p)  is  then  substituted  into  duster  (m)  to  obtain  (after  simplifica- 
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1. 

2.  p(fiXt),X2)  qiXuXi). 

3.  qim)J(Y2))^r(YuY2). 

4.  rifiX)J(X)). 


D 

(\  Pi=/wi  V"' 

^2 

3 

rf  Jr.  = /(«)  1  V“’ 

II 

D 

(\  Yx=fiw)] 

\[Y2  =  fm\J, 

n* 

D 

()4 

Figure  10.6:  Example  lUustratmg  the  Order  of  the  Unfolding  Steps 


tion): 

/  [t^a  = /(Xj)]® 

2  [U2  =  Y2h 

\  [Ux^U^Uh  Jx 

Finally,  cluster  (o)  is  substituted  into  (m)  to  obtain  the  constraint 

/iPi=/w)i?  V’ 

2  iu,=m)u 

V  Itr,  •*=!■  tr,i,  /, 

The  engine  then  terminates  and  outputs  the  constrsdnts  involving  clusters 
in  substitution  form.  Now,  the  only  constraint  output  for  is  the  last 
constraint  added  involving  cluster  (9).  Hence,  we  may  conclude  that  Vx  is 
ground  iff  U2  is  ground. 

Now,  suppose  that  we  allow  the  substitution  of  cluster  (n)  into  (m), 
obtaining  0  ([Ux  =  f{Xx)f^ ,  [U2  =  Y2],  [true]),  and  then  using  (o) 
to  substitute  into  this,  we  obtain  2  ([^1  =  /(.X^i)]® ,  [U2  —  1^2]® » 
[U2  —  /(W^)],  [inic]).  Since  the  cluster  in  this  constraint  is  in  substitution 
form,  it  is  output.  Clearly  it  forbids  the  inference  that  Vx  is  ground  iff 
U2  is  ground.  Lituitively,  the  problem  here  was  that  the  substitution  of 
(n)  into  (m)  was  performed  before  the  information  from  (o)  was  used;  this 
information  could  not  be  recovered  later. 


In  order  to  prove  correctness  of  the  enjpne,  it  is  necessary  to  first  define 
the  meaning  of  semantic  constraints  used  by  the  engine.  While  the  meaning 
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of  initial  semantic  constraints  obtained  from  a  program  has  already  been 
outlined  (see  Definition  19),  we  now  have  to  define  the  meaning  of  the  clus¬ 
ters  constructed  by  the  algorithm,  and  in  particular,  deal  with  approximate 
groups. 

If  $1  and  $2  are  substitutions,  then  let  o  O2,  the  composition  of  61  and 
^1,  be  the  substitution  that  maps  X  into  0i(02(X))  for  all  program  variables 
X.  Also,  write  61  <  02  if  there  exists  a  substitution  0  such  that  02  =  0  o  0x. 
Now,  let  mgi  be  a  function  that  maps  a  sequence  of  substitutions  0\,...,0n 
into  a  most  general  instance  of  the  substitutions  9,-,  if  one  exists.  That  is, 
0i  <  mgi(0i,...,0n),  i  =  l..n,  and  for  any  other  substitution  0,  if  0i  <  0, 
i  =  l..n  then  mgi(0i,...,0n)  <  0.  For  example,  mgi{[Xt-*’f{W),Yt-^a], 
is  [Xt->/(6),  y*->a]  (or  some  renaming  thereof).  Next,  define  a 
function  join  that  essentially  maps  a  set  5  of  variables  and  a  sequence  of 
substitutions  0i,...,0n  into  m^t(di,...,0n)|5  after  first  renaming  the  vari¬ 
ables  in  the  domains  of  ^i, . . . , to  avoid  variable  conflicts.  More  formally, 
define  join(S,  , . . . ,  ^n)  hy  first  picking  a  sequence  of  renamings  0[,...,0*^ 
such  that  the  sets  of  variables  Ux€S®®K^(^»(-^)))»  *  =  disjoint. 

Then  define iom(5, ^i,. ..  ,^n)  to  be  mjft(^odj,,..,tfJjo6l„)|5.  For  example, 
join{S,[X^f(a),Y^W],  [X*-*W])  is  equivalent  to 

[A’>-»W2])|5  where  IVi  and  Wj  are  new  variables.  Finally,  extend  the  join 

dicf 

function  to  apply  over  sets:  joinsiQi,-..,On)  —  {joins{0i, . . ■  ,0n)  •  € 

0,-,t  =  l..n}. 

An  interpretation  J  of  a  collection  of  semantic  constraints  is  a  mapping 
from  each  variable  into  a  set  of  substitutions,  denoted  X{9°‘).  Such  a 
mapping  is  extended  to  exact  groups  as  follows; 

I([5  =  t]a)  ^=  jm^u  (s  =  1 0(t)  :  0  6  Z('»“)|. 

Similarly,  I([>l]a)  =  coNc(>l).  The  interpretation  of  clusters  employs  join 
as  follows: 

=  ioin(vor(a),I(Gi),...,J(Gn)). 

As  usual,  an  interpretation  J  is  a  model  of  a  collection  of  semantic  constraints 
if  2{^i)  2  ^(cl)  for  each  constraints  3  c/  in  the  collection.  A  collection  C 
of  semantic  constraints  is  guaranteed  to  have  a  least  model  (modulo  variable 
renaming),  and  this  shall  be  denoted  by  lm{C).  Finally,  the  correctness  of 
the  engine  can  be  characterized  as  follows: 


10.4.  THE  ENGINE 
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Theorem  11  (Correctness  and  Termination)  Let  P  be  a  program  and 
let  V  be  a  finite*  abstract  domain.  Then  the  unfolding  engine  terminates 
on  P  using  V,  and  the  least  model  of  the  output  semantic  constraints  is  a 
conservative  approximation  of  the  collecting  semantics  of  P. 

Proof  Outline:  Let  Cp  be  the  semantic  constraints  for  P.  The  first  pant  of 
the  proof  establishes  that  lm(Cp)  corresponds  to  P’s  collecting  semantics. 
The  proof  is  essentially  similan  to  the  proof  of  correspondence  between  envi¬ 
ronment  constraints  and  collecting  semantics  (see  Chapter  4),  although  the 
intermediate  step  of  defining  a  ground  collecting  semantics  is  not  needed. 

The  next  pant  of  the  proof  establishes  that  the  algorithm  constructs 
a  conservative  approximation  of  lm{Cp).  Let  Cioop  denote  the  constraints 
obtained  adter  execution  of  the  main  repeat- until  loop  of  the  unfolding 
engine.  Let  Cno-ret  denote  the  constraints  obtained  by  deleting  the  residual 
groups  from  Cioop,  wd  let  C final  denote  the  final  constraints,  that  is  those 
obtained  by  selecting  only  the  constraints  involving  substitution  form  clus¬ 
ters  from  Cno-ref  Now,  the  main  loop  of  the  algorithm  merely  serves  to  add 
constraints  to  Cp.  Moreover,  the  original  constraints  in  Cp  do  not  contain 
any  residual  expressions.  Therefore,  Cp  is  a  subset  of  both  Cioop  and  C„o.res 
and  so  lm{Cp)  C  /m(Cno-r«*)- 

The  main  paut  of  the  correctness  proof  establishes  that  the  exhaustive 
application  of  the  unfolding  step  generates  sufficient  substitution  form  clus¬ 
ters  that  these  clusters  alone  characterize  the  least  model  of  Cno-res-  In 
essence  we  prove  a  form  of  completeness  of  the  unfolding  step  with  respect 
to  the  least  model.  The  proof  of  this  requires  reasoning  about  the  specific 
formulation  of  the  unfolding  step. 

Termination  is  proved  by  showing  that  there  are  only  a  finite  number  of 
different  clusters  that  can  be  produced.  This  is  essentially  because  during  an 
unfolding  step,  the  function  select  ensures  that  the  size  of  terms  appearing 
the  new  cluster  are  not  larger  than  those  appearing  in  the  previous  clusters. 
The  main  complications  are  (a)  the  part  of  the  simplification  step  that  deals 
with  head  only  variables,  since  this  step  can  increase  the  size  of  terms  and 
introduce  new  variables,  and  (b)  the  renamings  of  equations  during  the 
unfolding  step,  since  this  may  introduce  new  variables.  A  key  part  of  the 

^Some  alcMitlunsuM  domain*  iti»fying  m  weaker  condition  wicfa  a*  the  "finite  chain”  property. 
Our  engine  cnn  in  general  be  ndnpied  to  terminnte  <m  euch  domnins.  We  omit  the  detntls  for 
simplicity. 
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termination  proof  is  that  the  number  of  new  variables  can  be  bounded.  [] 


10.5  Variations  of  the  Engine 


The  main  parameter  of  the  engine  is  the  domain  used  for  approximation.  We 
now  discuss  two  other  parts  of  the  en^ne  where  there  is  scope  for  variation. 

Consider  the  select  function  that  is  used  to  curtail  the  growth  of  group 
equations.  Another  reasonable  definition  of  select  can  be  obtained  by  using  a 
uniform  bound  on  the  size  of  terms,  the  so-called  “depth-k”  approximation. 
The  important  point  is  that  the  use  of  any  select  function  does  not  affect 
correctness.  It  wiU,  however,  affect  accuracy  and  termination.  In  general, 
using  a  smaller  or  more  restrictive  function  will  make  the  proof  of  termina¬ 
tion  easier  (and  can  also  enhance  efficiency).  Using  a  bigger  function,  on 
the  other  hand,  is  more  accurate,  but  it  complicates  the  termination  proof 
in  general. 

Next  consider  the  output  of  the  engine.  As  defined,  this  consists  of  a 
collection  of  clusters  in  substitution  form.  Such  a  collection  is  an  explicit 
deffnition,  for  each  program  rule,  of  a  set  of  substitutions.  However,  to  max¬ 
imize  the  structural  information  obtained  from  the  engine,  residual  groups 
should  not  be  deleted  upon  termination.  This  however  raises  the  problem 
that  if  clusters  contsdn  residual  groups,  then  they  are  no  longer  an  explicit 
representation  of  a  set  of  substitutions.  In  fact,  the  sets  of  values  defined  by 
(the  least  model  of)  these  equations  are,  in  general,  not  decidable. 

Perhaps  the  best  general  technique  for  extracting  information  from  these 
residual  groups  is  to  approximate  them  by  interpreting  them  as  set  con¬ 
straints,  and  then  employing  the  intersection  component  of  the  intersection- 
projection  algorithm  described  in  Chapter  7.  It  is  important  to  note  that 
the  engine  as  stated  already  has  much  of  the  analytical  power  of  the  set 
constraint  approach  to  analysis.  Indeed  the  third  program  in  Section  10.1 
shows  that  it  is  sometimes  strictly  more  accurate  than  the  set  constraint 
approach.  By  augmenting  the  engine  output  with  the  residual  groups,  and 
applying  the  intersection  algorithm,  the  engine  in  fact  becomes  uniformly 
more  accurate  than  the  set  constraint  algorithm. 
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0.  initialize  the  array  value; 

1.  repeat 

2.  oldvalue  :=  value; 

3.  for  each  rule  in  P  with  label  a  and  body  A 

4.  na/ucfa]  :=  (J  V-unify  ^t)ar(a),  A,  value\fifj 

0 

5.  until  (value  =  oldvalue)  break; 

6.  output  value; 

Figure  10.7:  The  Standard  Engine  (Bottom-Up) 

10.6  Comparison  with  Abstract  Interpretation 


It  is  difficult  to  provide  a  uniform  characterization  of  the  many  abstract  in¬ 
terpretation  based  algorithms  for  lo^c  program  analysis  that  have  appeared 
in  the  literature.  This  is  because  each  algorithm  is  designed  with  differ¬ 
ent  criteria.  In  particular,  ad  hoc  approximations  are  often  introduced  for 
efficiency  reasons.  However,  these  differences  are  rarely  fundamental  from 
a  conceptual  point  of  view.  Consequently,  it  is  possible  to  define  an  ide¬ 
alized  engine  that  encompasses  the  essence  of  the  abstract  interpretation 
technique  underlying  these  algorithms,  and  is  at  least  as  accurate  as  any  of 
them.  We  call  this  idealized  engine  the  standard  engine,  and  we  now  outline 
its  definition. 

In  general,  each  analysis  algorithm  starts  by  associating  a  value  from  the 
chosen  abstract  domain  with  designated  parts  of  the  program.  The  operation 
of  the  algorithm  then  consists  of  repeatedly  recomputing  each  value  from 
the  values  previously  computed.  For  a  bottom-up  analysis,  an  informal  and 
simplified  outline  is  given  in  Figure  10.7.  The  variables  value  and  oldvalue 
are  arrays  indexed  by  rule  labels.  The  sequence  P  in  line  4  ranges  over  all  se¬ 
quence  of  rule  labels  such  that  the  head  of  the  rule  with  label  j3,-  is 

compatible  with  the  atom  of  A  (the  body  of  the  rule  with  label  o).  B^  de¬ 
notes  the  ^quence  Hi , . . . ,  such  that  H,-  is  the  head  of  the  rule  with  label 
/?,•.  value[0]  denote  the  sequence  of  abstract  values  va/«c[/3i], . . . ,  va/ue(/?„]. 
Finally,  the  expression  V-uni/y(var(a),A,B^,value[fi])  is  intended  to  de¬ 
note  the  result  of  abstractly  unifying  A  with  the  rule  heads  B^,  in  the 
context  of  the  abstract  values  value  [)9],  and  restricting  the  variables  in  the 
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1.  ^eqiX,Y),pifiY,Z)). 

2.  p{U)  ^  qiU). 

3.  q{f{W,V))^T{W). 

4.  r(c). 

5.  eq(S,S). 


Figure  10.8:  Example  Execution  of  the  Standard  Engine 
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resulting  abstract  value  to  var(a).  More  formally,  if  5  is  a  finite  set  of  pro¬ 
gram  variables,  A  denotes  Ai,...,An,B  denotes  and  A  denotes 

.4i, . . .  ,^n»  then  V-unify(S,  A, B,A)  is 


mgu  ( 

\ 

\ 

Ai  =  |0(Bl)|«ar(>4i) 
•  •  • 

■^n  —  l^(•®n)|t;o^(An) 


6i  €  coNc(>l,)>*  =  l.*n 


Figure  10.8  contains  an  example  execution  of  the  standard  engine.  The 
abstract  domain  used  here  is  Bprop'  Note  that  the  ***  column  describes  value 
after  i  iterations  of  the  main  loop  of  the  standard  engine. 


We  can  also  define  a  standard  engine  for  top-down  analysis.  An  informal 
outline  of  this  is  given  in  Figure  10.9.  The  variables  value  and  oldvalue  are 
arrays  indexed  by  labels  (including  rule  labels  and  body  atom  labels).  The 
sequence  0  in  line  5  ranges  over  all  sequence  of  labels  such 

that  Po  is  a  body  atom  label  and  the  body  atom  with  label  /?o  is  compatible 
with  Aq,  and  each  /9,-,  t  >  1,  is  a  rule  label  such  that  the  head  of  the  rule 
with  label  fii  is  compatible  with  A,-.  B^  denotes  the  sequence  Bo,...,Bj 
such  that  Bq  is  the  body  atom  with  label  and  each  Bi,  i  >  1,  is  the 
head  of  the  rule  with  label  0i.  value0]  denote  the  sequence  of  abstract 
values  va/ue[/?o])  values[fli], . . . ,  value[^j].  We  note  that  the  usual  notions  of 
computing  the  call  and  return  substitutions  for  each  rule,  are  subsumed  in 
the  standard  engine  by  the  computation  of  the  substitutions  encountered  at 
each  program  point. 


Both  the  bottom-up  and  top-down  standard  engines  are  parameterized 
by  an  abstract  domain  V.  These  two  engines  capture  the  essence  of  most 
of  the  abstract  interpretation  algorithms  in  the  literature.  For  example,  the 
bottom-up  engine  applies  in  the  case  of  [27,  43],  and  the  top-down  engine 
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0.  initialize  the  array  value; 

1.  do 

2.  value  :=  newvalue; 

3.  for  each  rule  with  label  On+i,  head  Aq  and  body  Af  ‘ , . . . , 

4.  for  j  =  0..n 

5.  let  A  denote  Aq, Ai,...,Aj; 

6.  newvalue{aj^-i\  :=  (J  V-unify  ^vor(a),  A,  B^,  value\pfj 

4 

7.  until  (value  =  oldvalue) 

8.  output  value; 

Figure  10.9:  The  Standard  Engine  (Top-Down) 
applies  in  the  case  of  [11,  15,  51]. 

In  order  to  compare  the  standard  engine  with  the  unfolding  engine,  we 
will  now  formulate  the  operation  of  the  standard  engine  using  our  frame¬ 
work  of  group  equations,  clusters  and  cluster  substitution.  Recall  that  in 
our  framework,  we  can  perform  bottom-up  or  top-down  analysis  simply  by 
using  the  appropriate  semantic  constraints.  Now,  consider  modifying  the 
unfolding  engine  to  mimic  the  behavior  of  the  standard  engine  as  follows: 

•  all  terminal  groups  [S  =  t]  in  the  semantic  constraints  for  P  are  first 
replaced  by  ABs(T;ar(5),msru(l  =  /)); 

•  the  unfolding  step  is  simplified  so  that  no  residual  groups  are  produced, 
and  this  can  achieved  simply  by  redefining  [C\^  =  [  ]a- 

The  main  effect  of  these  changes  is  that  clusters  in  substitution  form  contain 
only  approximate  groups  (and  no  terminal  or  residual  groups).  Hence  the 
only  operation  required  during  unfolding  is  the  composition  [£]  o  [A]  of  an 
exact  and  approximate  group.  Call  this  the  resulting  engine  the  restricted 
unfolding  engine.  Now,  if  , . . . ,  An  are  abstract  values,  then  let  Ai  A ...  A 
An  denote  the  join  of  these  abstract  value  (this  can  be  easily  formalized  using 
mgi,  ABS  and  CONC).  Using  this  notation,  we  can  state  the  correspondence 
between  the  restricted  unfolding  engine  and  the  standard  engine  as  follows: 

Proposition  45  For  all  programs  P,  if  the  restricted  unfolding  engine  out- 
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puts  semantic  constraints  C  and  the  standard  engine  outputs  the  array  value, 
then  for  each  relevant  label  a 

value{a)  =  U{-4i  A  •  •  •  A  2  (Mi]oi  i  •  •  • » [^n]oi  )a  appears  in  C}. 

n 


Clearly  the  modified  unfolding  steps  of  the  restricted  unfolding  engine 
are  less  accurate  than  the  unfolding  step  used  in  the  (normal)  unfolding 
engine.  Hence: 

Theorem  12  The  unfolding  engine  is  uniformly  more  accurate  than  the 
standard  engine  in  the  sense:  given  a  finite  abstract  domain,  the  output  of 
the  unfolding  engine,  for  any  program,  is  more  accurate  than  that  of  the 
standard  engine.  [] 

An  alternative  view  of  the  differences  between  the  standard  and  unfold¬ 
ing  engines  is  as  follows.  The  standard  engine  implements  abstract  inter¬ 
pretation  by  a  successive  iteration  starting  from  some  initial  values  Ao  and 
computing  the  sequence  .4o  — *  Ai  — *  A^  — *■  As***  where  T  repre¬ 
sents  the  abstract  semantic  function  at  hand  and  the  Ai  are  the  successive 
abstract  values  being  computed.  The  unfolding  engine,  on  the  other  hand, 
uses  a  changing  abstract  semantic  function  and  computes  Ao  — ^  Ai  — ^  As 
As  *  *  *  .  Such  a  change  takes  place  whenever  an  unfolding  step  is  ap¬ 
plied.  The  important  point  is  that  each  function  Ti  is  at  least  as  accurate 
as  the  original  function  J^. 

We  conclude  this  section  with  the  realization  of  some  analysis  algorithms 
using  our  engine.  Let  unfold(27)  denote  the  algorithm  obtained  by  using 
the  unfolding  en^ne  and  the  abstract  domain  P;  similarly  for  standard(P). 
Let  Vsharing  denote  the  domain  described  in  [27]. 

Corollary  2  For  groundness  analysis,  UNFOLD(Pprop)  ^  uniformly  more 
accurate  than  [44]®.  [] 

Corollary  3  For  sharing  analysis,  UNFOLD(P,jiormj)  uniformly  more  ac¬ 
curate  than  [27].  [] 

*By  [12],  UNFOLD(T>pnp)  is  uniformly  more  accurate  than  a  number  of  other  algorithms  as 
well. 
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10.7  Discussion 


We  have  presented  a  new  engine  for  analysis  of  logic  programs.  Its  starting 
point  is  a  specific  formulation  of  semantic  constraints  of  a  program.  We 
have  illustrated  how  these  constraints  can  model  bottom-up  and  top-down 
execution.  Using  a  given  abstract  domain  D,  the  engine  performs  unfolding 
on  these  constraints.  Each  unfolding  is  augmented  with  a  notion  of  approx¬ 
imation,  defined  in  terms  of  D,  to  curtail  the  expansion  of  terms.  The  main 
advantage  of  our  specific  technique  of  unfolding,  which  uses  the  key  con¬ 
cept  of  residual  groups,  is  that  there  is  a  uniform  and  accurate  treatment 
of  structural  information.  By  formulating  the  standaird  analysis  engine  as 
a  special  case  of  the  unfolding  en^e,  we  then  showed  that  the  unfolding 
engine  is  uniformly  more  accurate  than  the  standard  engine. 

Yet  to  be  aiddressed  is  the  issue  of  efficiency.  This  has  two  main  aspects. 
First  is  the  initial  number  of  clusters  in  the  constraints.  This  number  is 
bounded  by  the  number  of  sequences  of  head  atoms  that  match  the  sequence 
of  body  atoms,  for  each  rule.  The  second  part  concerns  the  unfolding  step. 
Though  the  number  of  new  groups  produced  by  this  step  is  linear  in  the  size 
of  the  input  clusters,  the  termination  of  the  engine  is  based  on  an  exhaustive 
application  of  this  step.  Experience  with  the  prototype  implementation 
described  in  Chapter  8  provides  some  evidence  that  these  two  problems 
can  be  overcome.  First,  the  issue  of  matching  body  and  head  atoms  in  set 
constraints  is  essentially  the  same  as  in  the  unfolding  engine,  and  this  has  not 
proven  to  be  prohibitive.  Next  consider  the  issue  of  unfolding  efficiency.  In 
set  constraints,  there  is  a  notion  of  substitution  that  is  similar  to  unfolding 
in  terms  of  the  number  of  new  expressions  generated  (but  whose  semantics 
is  very  different).  The  implementation  exploits  the  crucial  fact  that  the 
substitution  step  gives  rise  to  mostly  redundant  expressions.  Because  this 
implementation  has  shown  promise,  we  expect  that  the  currently  planned 
implementation  of  the  unfolding  engine  can  be  engineered  to  be  practical. 

We  conclude  by  observing  that  the  notion  of  combining  set  constraint 
techniques  with  an  ability  to  reason  about  inter-variable  dependencies  is 
similar  in  spirit  to  some  recent  extensions  of  tree  automata  that  allow  a 
limited  form  of  equality,  such  as  the  Bogaert  and  Tison’s  tree  automata  with 
equality  [9]  and  the  closely  related  shallow  set  coristraints  of  Uribe  [65].  Our 
work  differs  in  the  use  of  more  general  (although  less  uniform)  notions  of 
equality,  smd  moreover,  the  use  of  abstract  domains  to  capture  information 
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about  dependencies  that  is  tadlored  to  the  desired  program  analysis. 


Chapter  11 


Analysis  of  Functional 
Languages 


We  now  show  how  set  constraints  may  be  nsed  to  analyze  functional  lan¬ 
guages.  In  particular,  we  outline  extensions  to  the  basic  set  constraint  calcu¬ 
lus  for  higher-order  functions  as  well  as  references  and  assignment.  Impor¬ 
tantly,  these  extensions  represent  a  natural  extension  of  the  idea  of  treating 
variables  as  sets.  Each  extension  has  a  simple  definition  and  leads  to  an 
intuitive  notion  of  program  approximation.  Moreover,  the  key  advantage 
of  the  set  constraint  approach  -  accurate  and  uniform  treatment  of  data 
structures  -  is  preserved. 

This  material  in  this  chapter  is  very  preliminary  and  informal.  No  proofs 
of  correctness  are  given.  However,  many  of  the  ideas  have  been  implemented, 
and  we  shall  describe  some  of  the  results  of  this  effort.  We  also  discuss  the 
close  connections  between  the  use  of  set  constraints  to  analyze  functional 
languages  and  subtype  systems.  In  particular,  we  propose  formalizing  our 
analysis  as  an  "optimal”  system  of  simple  subtypes. 
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11.1  Higher-Order  Set  Constraints 


To  analyze  functional  languages  such  as  Standard  ML  [46],  set  constraints 
must  be  extended  with  a  mechanism  to  analyze  higher-order  functions.  In 
essence,  this  is  achieved  by  the  addition  of  three  new  components.  First,  a 
new  set  operator  apply  is  introduced.  Second,  a  new  collection  of  function 
symbols  are  introduced,  and  these  shall  be  used  to  denote  the  functions 
defined  in  a  program  (as  oppose  the  program’s  data  constructors).  This  new 
set  of  symbols  shall  be  denoted  by  F,  and  is  assumed  to  be  disjoint  from 
S,  the  set  of  data  constructors.  Third,  for  each  symbol  F  e  F,  two  new 
variables  dom(F)  and  ran(F)  shall  be  introduced  to  respectively  capture 
the  domain  and  range  of  the  function  F. 

As  usual,  an  interpretation  of  a  collection  of  set  constraints  is  a  mapping 
from  each  set  variable  (including  dom{F)  and  ran(F),  F  €  F)  into  sets  of 
values.  Values  are  defined  to  be  either  symbols  from  F  or  S,  or  of  the  form 
/(vi, . . . ,  Vn)  such  that  /  is  an  n-ary  symbol  from  S  and  each  v,-  is  a  value. 
An  interpretation  is  extended  to  map  from  set  expressions  into  sets  of  values 
as  follows: 


•  I(/(s€i, . . . , scn))  —  {/(®i>  •  •  •  »®n)  •  €  T(5Cj), *  —  l..n),  /  6  Sj 

.  I(F)  =  {F},Fe.F; 

•  2’(/(7)‘(se))  **=  {ui  :/(t;i,...,u„)€T(sc,)}; 


•  T(sci  n  •  •  •  n  se„)  ^  {u  :  w  €  Z(5e,),i  =  l..n}; 


I(app/y(sci,s€2)) 


provided  I{dom(F))  D  Z(se2),  F  €  I(sei); 


The  critical  part  of  the  interpretation  is  Z(app/y(5ei,  562)),  which  consists  of 
two  parts.  In  essence,  the  first  part  corresponds  to  application  involving  data 
structures,  and  the  second  part  corresponds  to  the  applications  involving 
functions  defined  in  the  program.  The  side  condition  on  this  definition 
states  that  the  set  of  values  that  each  function  F  is  applied  to  must  be 
contained  in  the  set  dom{F).  Note  that  J  is  now  a  partial  function  that  is 
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X  D  dom(id) 
ran(id)  D  X 

€  2  apply  (id,  c) 

Figure  11.1:  Example  IHmctional  Program  and  Set  Constraints 

defined  on  a  set  expression  se  if  2  is  defined  on  all  (proper)  subexpressions 
of  5e.  An  interpretation  2  is  a  model  of  a  collection  of  constraints  if,  for 
each  constraint  A'  2  se  in  the  collection,  2(5e)  is  defined  and  2(A')  2  2(se) 
(and  similarly  for  constraints  A’  =  se  in  the  collection). 

We  now  outline  how  such  constraints  may  be  used  to  analyze  functional 
programs,  using  a  fragment  of  ML.  We  shall  assume  that  bound  variables  are 
renamed  so  that  each  bound  variable  is  distinct.  Now,  consider  the  program 
P  in  Figure  11.1,  in  which  c  is  a  data  constructor  of  arity  0.  We  distinguish 
between  bound  variables  that  represent  functions  (such  as  id)  and  other 
bound  variables  (such  as  X)\  we  call  the  latter  program  variables  (note  that 
program  variables  can  take  functions  as  values).  For  each  program  variable 
we  introduce  a  set  variable  whose  purpose  is  to  capture  the  possible  values  of 
the  program  variable.  We  shall  s^ain  use  letters  AT,  V, ...  to  denote  program 
variables  and  A',3^, . . .  for  the  corresponding  set  variables. 

The  constraints  for  the  program  in  Figure  11.1  are  constructed  as  follows. 
Corresponding  to  the  first  line  of  the  program  (the  definition  of  id),  we 
introduce  two  constraints.  The  first  is  A'  2  dom{id)  which  captures  the  fact 
that  the  values  for  X  must  include  all  of  the  possible  values  with  which  id 
may  be  called.  The  second  constraint  is  ran(id)  D  X,  amd  this  reflects  the 
fact  that  the  return  values  for  id  must  contain  all  possible  values  for  X. 
Finally,  the  third  constraint  €  2  opplyiid,c)  corresponds  to  the  application 
{id  c).  The  set  variable  £  is  introduced  to  capture  the  value  of  the  whole 
program.  The  least  model  of  these  three  constraints  is 

X  ^  {c} 
dom{id)  {c} 
ran{id)  (c) 

£  *-»  {c} 

which  exau:tly  captures  the  run>time  behavior  of  the  program. 


let  fun  id  X  = 
in 

id  c 
end 
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X  D  dom(id) 
ran(id)  2  X 

S  2  {apply{id,b),apply{id,c)) 

Figure  11.2:  Example  Appraxiination  of  a  ihinctional  Program 

However,  the  set  constraints  are  not  always  exact.  In  general,  the  least 
model  of  the  set  constrsdnts  for  a  program  describe  a  conservative  approx¬ 
imation  of  the  possible  run-time  values  of  variables.  This  is  because  the 
set  constraints  ignore  dependencies  between  variables  and  dependencies  be¬ 
tween  the  domain  and  codomain  of  a  function.  The  latter  is  achieved  through 
the  use  of  the  variables  dom(F)  and  ran(F),  which  respectively  roUect  the 
values  of  the  domain  and  codomain  of  the  function  F.  To  illustrate  this, 
consider  the  program  and  constraints  in  Figure  11.2.  The  least  model  of 
these  constraints  is 

X  {6,c} 
dom(id)  {b,c} 
ran(id)  {6,c} 

S  ^  {(6,6),(6,c),(c,6),(c,c)} 

and  so  S,  which  consists  of  four  pairs,  defines  an  approximation  of  the  pro¬ 
gram. 

Before  giving  further  example  constrmnts,  we  note  that,  for  convenience, 
we  assume  that  each  function  is  "named”.  Hence,  terms  containing  anony¬ 
mous  functions  such  as 

((fiiX=>x)(/hy=)^Y)) 

must  first  be  rewritten  into  let  fun  F  X  =  X  and  GY  =  V  in  (F  G)  end 
before  they  are  analyzed. 


let  fun  id  X  =  X 
in 

{id  b,id  c) 
end 


11.2  Examples 

Consider  the  program  in  Figure  11.3,  involving  the  map  function.  We  use 
nil  and  cons  to  denote  the  data  constructors  for  lists.  Before  discussing  the 
constraints  for  this  program,  first  note  that  in  ML,  all  data  constructors 
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let  fun  map{F,  cons(X,  £))  =  cons(F  X,  map(F,  i)) 
I  map  nil 
and  id  Y  =  Y 
in 

map(id,  oons(l,  cons{2,  oons(3,  nil)))) 

end 


ran{map)  D 
ran(mop)  3 
F  3 
2 

C  3 
ran{id)  3 

y  2 
€  2 


cons{{apply{F,  X),  apply{map,  {F,  £)))) 
nil 

{F  :  (F,  cons((X,  L)))  €  dom(map)} 

{X  :  (Fj  cons((X,  L)))  €  dom(map)} 

{L  :  (F,  cons((X,  L)))  €  dom(map)} 

y 

dom(id) 

apply  (map,  (id,  con«((l,  cons{(2,  oi>im((3,  n»/)))))))) 


Figure  11.3:  Map  Program  and  Its  Set  Constraints 


and  functions  are  unary.  Non-unary  data  constructors  and  functions  are  ob¬ 
tained  using  tupling  operations.  For  example,  the  program  term  oons(3,  nil) 
is  in  fact  an  application  of  cons  to  the  pair  (3,  nil).  Hence,  the  set  expres¬ 
sion  corresponding  to  this  program  term  is  apply(cons,(3,nil)).  However, 
this  is  equivalent  to  the  set  expression  cons((Z,nil)),  that  is,  the  set  expres¬ 
sion  cons(se)  where  se  is  (3,  nil).  The  set  constraints  in  Figure  11.3  use 
this  simpler  formulation  for  program  terms  involving  applications  of  data 
constructors. 

Now,  consider  the  definition  of  the  function  map.  This  contains  two 
match  rules:  map(F,  cons(X,  L))  =  oons(F  X,  map(F,  L))  and  map  _  = 
nil.  E^ach  match  rule  generates  one  constraint  for  ran(map).  Also,  the  first 
match  rule  generates  bindings  for  F,X  and  L.  The  values  for  these  variables 
are  given  by  considering  all  environments  p  such  that  p((F,  cons(X,  L)))  € 
dom(map).  Specifically,  the  values  for  F,  X  and  Y  are  given  by 
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{p(f’)  :  p((F,  cons(X,  L)))  €  dom(map)} 

{p(X)  :  p((F,  cons(X,  L)))  €  dom\map)) 

IpIl)  :  p((F,  corw(A^,  I)))  €  dom(map)} 

Recalling  the  definition  of  quantified  expressions,  these  are  exactly  the  con¬ 
dition  imposed  by  the  constraints  on  X  and  C  respectively. 

More  generally,  match  and  case  expressions  may  contain  an  arbitrary 
number  of  match  rules.  To  model  the  sequential  nature  of  the  execution  of 
match  conditions,  we  shall  use  complement  constants.  For  example,  consider 
the  following  case  statement,  which  is  assumed  to  appear  inside  the  scope 
of  the  variable  L: 


csise  L  of 

cons(U,  oons(V,  W))  ^  si 
I  cons(X,Y)  ^  S2 
I  nil  ^  S3 


We  briefly  review  the  execution  of  such  a  statement.  First,  an  attempt  is 
made  to  match  the  value  of  L  against  cons(X,  com{Y,  Z)).  If  this  match 
succeeds,  then  the  values  obtained  for  U,  V  and  W  are  used  in  the  evaluation 
of  Si.  If  the  match  fails,  then  an  attempt  to  match  L  against  cons{X,Y)  is 
made.  Again,  if  the  match  succeeds,  then  the  values  obtained  for  X  and  Y 
are  used  in  the  evaluation  of  S3.  If  the  match  fails,  then  the  value  of  L  is 
matched  against  nil  (and  assuming  that  L  is  a  list,  this  will  always  succeed), 
and  S3  is  evaluated.  Now,  suppose  that  and  £  are  the  set 

variables  corresponding  to  U,  V,  W,X,Y  and  L  respectively,  and  consider  the 
following  constraints  for  modeling  the  possible  variable  bindings  resulting 
from  execution  of  the  case  statement: 

U2{U:  cons((If,cons((V,W))))  €  £> 

V  2  {V:  consiiUyConsiiVyW))))  €  £} 

W  2  {W  :  ams((f^,c<ms((V,W))))  €  £} 

X  D  ^X  :  oons((A',y))  €  £  n  cons((T,cons((T,T))))| 

y  2  |y  :  cons((X,y))  €  £  n  cons((T,oons((T,T))))} 

Note  that  in  the  constraints  corresponding  to  the  binding  of  X  and  Y,  the 
expression  oons((A’,  y))  is  matched  against  values  in  the  set  described  by 
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£ncon5((T,  cons((T,  T)))),  that  is,  the  set  of  values  in  £  that  do  not  match 
the  pattern  cons((J7,cons((V,W'')))). 

To  illustrate  the  construction  of  constraints  corresponding  to  case  state¬ 
ment,  consider  the  program  in  Figure  11.4,  which  computes  the  disjunctive 
normal  form  of  a  propositional  formula.  The  propositional  constants  are 
z0,...29,  and  o,  a  and  n  denote  and,  or  and  not  respectively.  Figure  11.4 
also  contains  the  constraints  corresponding  to  the  definition  of  the  function 
dnf. 


11.3  Extensions  and  Variations 


The  basic  approach  of  using  constraints  to  analyze  function  programs  can 
be  extended  in  a  number  of  ways.  We  first  consider  the  imperative  com¬ 
ponents  of  the  ML  language,  namely  references  and  assignment.  These  cam 
be  modeled  as  follow;  for  each  expression  ref{t)  that  appears  in  the  pro¬ 
gram  to  be  analyzed,  a  distinct  new  constant  c„t  and  a  corresponding  new 
variable  val{c„t)  are  introduced.  The  new  variable  val{c,^)  is  used  to  cap¬ 
ture  the  possible  values  assigned  to  ref  cells  generated  by  this  occurrence  of 
ref.  Each  introduced  constant  is  called  a  reference  constant.  Figure  11.5 
gives  an  example  program,  its  set  constraints  and  the  least  model  of  the  set 
constraints.  Three  new  kinds  of  set  operators  are  employed.  The  meaning 
of  each  operator  is  defined  as  follows: 


•  X{seq{sei 


^  j  if  T(sei)  ^  {}  for  each  i  <  n 

X(sei)  =  {}  for  some  i  <  n 


•  X{a38ign{sei,se2))  =  {()}  provided  t;o/(c„f)  2  ^(^62)  for  each  refer¬ 
ence  constant  c,«f  €  X{se\). 

dcf  I  I 

•  X{deref{se))  =  [J  va/(c„f)  where  c„f  ranges  over  reference  constants. 


There  is  also  considerable  scope  for  varying  the  constraints  themselves. 
In  particular,  the  procedure  for  constructing  constraints  outlined  in  the  pre¬ 
vious  section  associates  two  variables,  dom{F)  and  ran(F)  for  each  function 
F  in  the  program.  This  means  that  each  function  is  analyzed  once  in  the 
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datatype  formulas  =  2:0  |  zl  |  ;;2  |  ^3  |  z4  }  z5  |  z6  |  z7  |  |  z9 

I  o  of  formulas  *  formulas 
I  a  of  formulas  *  formulas 
I  n  of  formulas 

fun  dn/(o(Xl,Yl))  =  o(dnf(Xl),dnfiYl)) 

I  dnf(a(X2,Y2))  =  normla(dnf(X2),  dnf(Y2))) 

I  dnf(n(X3))  =  norm(n(dnf(X3))) 

I  dnf(X4)  =  X4 

and  norm(a(o(Al,Bl),Cl))  =  o(norm(a(Al,Cl))ynorm(a(Bl,Cl))) 

I  norm(a(C2,o(i42,B2)))  =  o(norm(a(C2,i42)),norm(a(C2,52))) 

I  norm(n(o(A3,  B3)))  =  norm(a(dnf(n(A3)),dnf(n(B3)))) 

I  norm(n(a{A4,B4)))  =  o(dnf(n(A4))ydnf(n(B4))) 

I  norm(n(n(A5)))  =  AS 
I  nonn(n(A7))  =  n(A7) 

I  norm(o(A8,A9))  =  a(A8,A9) 

ran{dnf)  D  o((applj/(dnf,  A'l),  apply{dnfy  yi))) 

ran{dnf)  D  apply(norm,  a((apply(dnf,  X2),  apply{dnf,  y2)))) 

ran(dnf)  D  apply(norm,n(apply(dnf,X3))) 

ran(dnf)  D  X4 

XI  D  {XI  :  o((Xl,  yi))  €  domidnf)} 
yi  D  {yi  :  o{{Xl,  YI))  €  dom{dnf)} 

X2  D  1X2 :  a((X2,  y2))  €  dom{dnf)  n  o((T,  T))} 

y2  D  {Y2  :  a((X2,  Y2))  €  dom{dnf)  fl  o((T,  T))} 

X3  D  {X3  :  n(X3)  €  dom{dnf)  fl  o((T,  T))  n  o((T,  T))} 

A'4  D  {X4  :  X4  €  dom{dnf)  n  o((T,  T))  fl  o((T,  T))  n  n(T)} 

Figure  11.4:  DNF  Program  and  Selected  Constraints 


11.3.  EXTENSIONS  AND  VARIATIONS 


327 


let  fun  id  X  =  (1;X) 
ana  /  y  =  1; 
val  Z  =  ref  {id) 


id  2;  Z  :=  /;  (IZ)  3 
end 


ran{id)  D  seq{\,X) 

ran{id)  •->  {1,2,3} 

X  D  dom{id) 

X^  {1,2,3} 

ran{f)  D  1 

ran{f)  1 

-Z  2  Crrf 

2  *->■  {c„,} 

val{c„t)  D  id 

va/(c„,)  1-^  {id,f} 

£  D  seq{id  2,assign{Z,f),deref{Z)) 

{1,2,3} 

Figure  11.5:  Analysis  of  References 


sense  that  there  is  no  attempt  to  separately  analyze  the  function  on  different 
inputs.  For  example,  in  the  analysis  of  the  program 

let  fun  id  X  =  X 
in 

{id  1,  id  nil) 
end 

the  two  uses  of  id  are  combined,  and  so  in  the  least  model  dom{id)  is  {1,  ntV}. 
Hence  the  result  of  the  program  is  approximated  by  {(1,1),  (1,  nil),  {nil,  1), 
{nil,  nil)}.  In  essence,  the  analysis  is  monomorphic. 

It  is  natural  to  consider  modifying  the  constraints  generated  from  a  pro¬ 
gram  to  facilitate  some  degree  of  poly-variant  analysis  in  which  functions  are 
analyzed  for  a  number  of  different  calls.  The  main  issue  is  how  to  introduce 
and  control  the  notion  of  different  calls.  One  method  considers  eaudi  occur¬ 
rence  of  each  function  in  a  program,  and  separately  analyses  each  function 
occurrence.  For  example,  in  the  above  program,  the  two  occurrences  of  id  in 
(id  1,  id  nil)  would  be  analyzed  independently;  the  domain  of  the  first  would 
be  {1}  and  the  domain  of  the  second  would  be  {ni/}.  Using  this  scheme, 
the  result  of  the  program  is  approximated  by  {(l,m7)}. 
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We  observe  that  there  is  a  close  analogy  between  the  use  of  function 
occurrences  and  the  notion  of  polymorphism  embodied  in  the  polymorphic 
let  typing  rule  for  ML.  Taking  this  analogy  a  step  further,  it  is  in  princi¬ 
ple  possible  to  extract  set  constraints  from  a  let  bound  expression,  simplify 
these  constraint  and  then  appropriately  generalize  them,  just  as  the  type 
inference  rule  for  let  in  ML  allows  (certain)  t5rpe  variables  in  the  type  ex¬ 
pression  obtained  for  a  let  bound  expression  to  be  generalized.  Such  an 
approach  may  provide  a  basis  for  implementing  separate  analyze  of  distinct 
function  occurrences.  We  also  note  that  the  general  problem  of  introducing 
and  controlling  the  notion  of  “different  calls”  arises  in  a  number  of  contexts 
in  type  theory.  For  example,  in  the  intersection  type  discipline  [64],  type  in¬ 
ference  procedures  must  limit  the  number  of  different  types  that  a  function 
is  given.  See,  for  example,  [17,  53]. 


11.4  Implementation 


The  main  modifications  to  the  set  constraint  algorithm  for  solving  the  new 
kinds  of  constraints  introduced  in  this  chapter  are  new  transformations  to 
simplify  the  operations  apply,  seq,  assign  and  deref.  In  essence,  these  new 
transformations  can  be  stated  as  follows: 

•  If  C  contains  X  D  apply(F,a)  where  F  ^  T,  then  output  dom{F)  D  a 
and  X  D  ran(F). 

•  If  C  contains  X  D  seq(ai,...,an)  and  /m(eay/tc»f(C))(a,)  ^  {}  for 
i  <  n,  then  output  X  Da^. 

•  If  C  contains  X  D  assign(c„,,a)  then  output  val{c„,)  D  a  and  X  D  (). 

•  If  C  contains  X  D  deref  {c,^)  then  output  X  D  val^c^,). 


where  a,  ai , . . . ,  are  non- variable  atomic  expressions. 

The  prototype  set  constraint  implementation  described  in  Chapter  8  has 
been  extended  to  include  these  operators.  Very  preliminary  results  sug¬ 
gest  that  the  cost  of  solving  set  constraints  from  functional  programs  is 
not  substantial  and  is  comparable  to  type  inference  in  the  Standard  ML  of 
New  Jersey  compiler.  Table  11.4  presents  some  benchmark  results,  again  all 
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time 

equations 

dnf 

0.70 

25-1-95 

binary -add 

0.19 

26-1-49 

poly-binary -add 

1.09 

246-1-287 

Table  11.1:  Preliminary  Results  for  Three  Functional  Programs 


times  are  in  seconds  on  a  Sun  Sparc  1+  (24MB)  running  Mach  and  using 
version  0.75  of  Standard  ML  of  New  Jersey.  Entries  in  the  second  column  of 
the  table  are  of  the  form  z  +  y,  where  x  is  the  number  of  equations  initially 
generated  from  the  program,  and  y  is  the  number  of  equations  added  during 
the  execution  of  the  algorithm.  The  dnf  benchmark  is  based  on  the  program 
that  appears  in  Figure  11.4.  The  binary-add  program  is  a  simple  12  line 
program  for  adding  numbers  in  binary  representation.  The  poly-binary  ^d 
is  a  program  that  simulates  the  poly- variant  analysis  in  which  different  oc¬ 
currences  of  a  function  are  analyzed  separately.  This  is  done  by  making  a 
separate  copy  of  the  body  of  each  function  for  each  occurrence  of  that  func¬ 
tion.  The  program  consists  of  about  130  lines.  We  remark  that  the  results 
in  Table  11.4  are  very  preliminary,  and  no  effort  has  been  put  into  optimiza¬ 
tion  of  the  functional  analysis  part  of  the  implementation.  It  is  expected 
that  the  results  can  be  substantially  improved. 

The  analysis  of  functional  programs  appears  to  be  substantially  faster 
than  for  comparable  logic  programs.  The  main  reason  for  this  is  that  the 
intersection  operation  is  not  heavily  used  in  set  constraints  arising  from 
functional  programs.  In  fact,  the  main  place  where  intersection  is  used  is 
in  the  analysis  of  case  statements,  and  these  intersection  are  usually  of  a 
very  simple  form.  For  many  programs  (including  the  dnf,  binary -add  and 
poly.binary-add  benchmarks),  the  uses  of  intersection  can  be  solved  without 
introducing  new  variables.  The  set  based  analysis  of  such  programs  can  be 
performed  in  polynomial  time. 

We  conclude  with  an  example  of  the  results  of  the  analysis  of  a  program. 
Recall  the  dnf  program,  which  appears  in  Figure  11.4.  Suppose  we  wish  to 
approximate  the  result  of  applying  the  function  dnf  to  the  program  variable 
Q,  where  Q  is  bound  to  some  unknown  formula  (that  is,  Q  is  some  value 
constructed  from  o,  a,  n  and  the  propositional  constants  zO, ...,z9).  The 
output  of  the  set  constraint  algorithm  for  this  analysis  is 
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€  =  z0UzlU22Uz3U24U25U26Uz7U28Uz9Uo((£,f))Ua((^,A'))Un(y) 
X  =  z0UzlUz2Uz3U24U25Uz6Uz7Uz8Uz9Ua((^,A'))Un(y) 
y  =  zO  U  zl  U  z2  U  z3  U  24  U  z5  U  z6  U  27  U  z8  U  z9 

where  €  is  the  set  variable  that  captures  the  values  of  (dnf  Q).  Note  that 
this  defines  €  to  be  exactly  the  set  of  formulas  in  disjunctive  normal  form. 


11.5  Discussion 


The  main  aspect  of  the  set  based  analysis  of  functional  programs  that  has  not 
been  addressed  is  correctness.  We  are  currently  investigating  a  direct  proof 
of  this  using  an  operational  semantics,  as  well  as  an  alternative  method 
using  connections  with  simple  subtype  systems  [47].  In  particular,  there 
appears  to  be  an  intuitive  characterization  of  the  results  of  the  set  based 
analysis  of  a  program  in  terms  of  an  optimal  system  of  simple  subtypes. 
To  see  why  this  is  so,  observe  that  in  a  system  of  simple  subtypes,  the 
“accuracy”  of  the  types  obtained  for  a  program  depends  intimately  on  the 
structure  of  the  underlying  base  types.  As  more  base  types  are  added,  the 
ability  of  the  type  system  to  distinguish  between  different  program  terms  is 
enhanced.  This  situation  is  analogous  to  that  of  abstract  interpretation  (see 
the  discussion  in  Section  5.6,  page  133).  Moreover,  just  as  set  based  analysis 
can  be  characterized  as  an  optimal  abstract  interpretation  that  ignores  inter¬ 
variable  dependencies,  so  it  seems  natural  to  seek  a  chziracterization  of  the 
set  based  analysis  of  functional  programs  as  an  optimal  subtype  system. 
We  also  note  that  there  are  intriguing  connections  between  the  kinds  of 
transformations  used  to  solve  the  set  operator  apply  (see  Section  11.4)  and 
the  algorithm  described  by  Mitchell  [47]  for  type  inference  in  simple  subtype 
systems. 

We  now  briefly  outline  the  related  literature.  One  of  the  early  uses  of 
constraints  in  the  context  of  functional  languages  is  by  Mishra  and  Reddy, 
who  described  the  use  of  (an  extended  notion  of)  regular  trees  for  generating 
types  for  untyped  functional  programs  in  [49].  The  type  system  described 
is  discriminative  in  the  sense  that  it  is  based  on  sets  of  terms  that  aire 
cartesian  closed  (speciflcally,  the  sets  of  terms  are  closed  under  the  *  operator 
described  in  Section  5.6,  page  134). 

Subsequently,  Aiken  and  Murphy  described  a  type  inference  algorithm 
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for  FL  in  [3].  This  system  is  also  based  on  the  use  of  regular  trees  for  the 
representation  of  types,  although  the  restriction  to  discriminative  types  is 
omitted.  In  essence,  the  focus  of  both  works  is  on  obtaining  some  conserva¬ 
tive  approximation  of  a  program.  However,  it  is  usually  difficult  to  under¬ 
stand  exactly  what  approximation  will  be  computed  for  a  given  program.  In 
particular,  the  algorithms  employed  often  introduce  ad  hoc  approximations 
as  they  execute.  For  example  [3]  uses  a  heuristic  for  dealing  with  recursion, 
and  although  this  ensures  that  the  algorithm  terminates,  it  results  in  an 
unpredictable  loss  of  information. 

In  contrast,  our  analysis  for  functional  programs  seeks  to  extend  the 
set  based  analysis  philosophy  of  constructing  simple  constraints  that  have 
a  straightforward  connection  with  the  program  at  hand,  and  then  solving 
these  constraints  exactly.  An  important  motivation  for  this  is  the  desire  to 
obtain  a  natural  and  intuitive  notion  of  program  approximation  that  can 
form  the  basis  of  an  expressive  subtype  system. 

More  closely  related  to  our  work  is  the  algorithm  for  analysis  of  higher 
order  functional  programs  by  Jones  [31].  Our  work  differs  from  [31]  in  two 
main  respects.  First,  we  analyze  a  strict  language,  whereas  [31]  considers  a 
lazy  langu^e.  Second,  and  perhaps  lore  importantly,  we  have  used  con¬ 
straints  instead  of  grammars.  The  advantage  of  constraints  is  that  they 
provide  a  simple  and  intuitive  characterization  of  the  analysis  that  is  inde¬ 
pendent  of  algorithmic  considerations.  In  contrast,  the  analysis  in  [31]  is 
characterized  only  by  a  somewhat  intricate  grammar  rewriting  algorithm. 
Moreover,  by  using  constraints,  connections  with  other  type  systems  (par¬ 
ticularly  subtype  systems)  are  more  apparent. 

Very  recent  work  by  Aiken  and  Wimmers  [6]  is  also  closely  related  to 
ours.  However,  their  formulation  of  constraints  over  ideals  appears  to  dif¬ 
fer  substantially  from  our  approach,  although  the  precise  connections  are 
unclear  at  this  stage. 

We  hnally  observe  that  implicit  in  set  based  analysis  is  a  form  of  minimal 
function  graph  computation  or  control  flow  analysis  [35,  57],  In  particular, 
the  least  model  of  the  set  constraints  provides  a  conservative  approximation 
of  the  possible  function  calls  at  each  call  site  in  the  program.  The  notion 
of  control  flow  analysis  embodied  in  essentially  the  same  as  the  Shiver’s  0^^ 
order  control  flow  analysis  of  [57],  although  the  accuracy  of  the  information 
we  obtain  is  somewhat  more  accurate  because  of  the  improved  treatment  of 
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structures  in  set  based  analysis  (in  particular,  a  function  can  be  stored  in  a 
data  structure  and  then  recovered  at  some  later  stage). 


Chapter  12 

Conclusions 


We  briefly  summarize  the  central  ideas  of  the  thesis  and  discuss  some  of  the 
strengths  and  weakness  of  the  set  based  analysis  approach.  We  also  outline 
the  current  status  of  this  work  and  highlight  some  important  areas  of  future 
work. 
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The  underlying  objective  of  this  work  was  to  devdop  an  approach  to 
program  analysis  in  which  the  definitional  and  algorithmic  aspects  of  the 
analysis  are  separated.  In  particular,  we  wanted  a  definition  of  program 
approximation  that  had  a  simple  underlying  intuition  and  could  be  stated 
in  a  declarative  non-algorithmic  manner.  Moreover,  we  required  a  definition 
that  yielded  accurate  information  on  program  structure  (an  area  where  tra¬ 
ditional  methods  have  been  weak)  and  could  be  computed  with  acceptable 
efficiency  over  the  kinds  of  programs  that  are  typically  appear  in  practice. 

Toward  this  end  we  considered  an  approach  to  analysis  based  on  ignor¬ 
ing  inter-variable  dependencies.  This  was  formalized  by  viewing  program 
variables  as  sets  of  values  and  then  extending  this  view  to  provide  a  set 
based  approximation  of  a  program’s  collecting  semantics.  The  core  chap¬ 
ter  of  the  thesis  shows  that  this  approximation  (denoted  sbup)  is  decidable. 
The  use  of  constraints  is  featured  heavily  throughout,  both  in  the  defini¬ 
tion  of  sbop,  and  in  its  computation.  In  particular,  we  develop  a  calculus 
of  set  constraints  that  is  used  to  represent  sbap  and  also  forms  the  main 
data  structure  of  the  algorithm  for  sbop.  Importamtly,  ons  central  calculus 
is  used  to  represent  the  set  based  approximation  of  a  number  of  different 
languages  and  operational  semantics.  The  specific  set  operators  required  for 
different  languages  vary  according  to  the  language’s  semantic  operations, 
but  the  essence  of  the  constraints  is  the  same  in  each  case. 

Not  only  is  the  set  constraint  calculus  important  for  computing  the  spe¬ 
cific  approximation  sbop,  but  it  appears  to  be  useful  in  its  own  right.  In 
particular,  it  provides  a  very  flexible  and  declarative  intermediate  language 
for  defining,  reasoning  about  and  computing  program  approximations  in  the 
set  based  style.  Numerous  variations  are  possible  in  the  way  set  constraints 
are  written  to  analyze  a  program.  For  example,  the  basic  set  constraints 
from  a  program  can  be  modified  to  provide  forms  of  analysis  that  distin¬ 
guish  between  different  instances  of  a  function  or  predicate  or  calls  from 
different  call  sites.  On  the  other  hand,  the  constraints  can  be  simplified 
to  trade  off  some  of  the  accuracy  of  the  constraints  and  against  the  cost 
of  the  analysis.  Although  these  modifications  mean  that  the  resulting  set 
constraints  will  not  be  strictly  faithful  to  the  approximation  sbop,  they  re¬ 
tain  many  of  the  important  intuitions  and  accuracy  advantages  inherent  in 
set  based  analysis.  Set  constraints  can  also  adapted  to  compute  informa¬ 
tion  other  than  values  of  program  variables.  For  example,  in  Chapter  9,  we 
outlined  adaptations  for  mode  and  structure  sharing  information. 
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program 
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based  analysis 

result  of  abstract 
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1. 

2.  p{f{X))^q{X). 

3.  q{g{Y))^T{Y). 

4.  r(h(o)). 

D  /(A'2) 

2  9{y^) 

2  h(a) 

u'  ->  {/WW))} 

«  {9(*(«))} 

J”  «  {/i(a)} 

Figure  12.1:  Compact  Representation  of  Constraints 


Although  the  use  of  the  inter-variable  dependency  approximation  leads 
to  simple  notions  of  program  approximation  and  gives  a  uniform  treatment 
of  program  structure,  the  loss  of  inter-variable  dependencies  is  a  substantial 
drawback  for  certain  kinds  of  analysis.  Motivated  by  the  desire  to  capture 
some  information  about  inter-variable  dependencies  and  also  maintain  an 
accurate  treatment  of  structure,  we  developed  an  extended  notion  of  set 
constraint  that  incorporates  inter- variable  dependencies  through  an  abstract 
interpretation  style  domain.  This  unfolding  en^e  is  strictly  more  accurate 
than  either  abstract  interpretation  or  set  constraint  approaches  to  analysis. 
One  of  the  tradeoffs  for  this  gain  in  accuracy  is  that  there  is  no  longer  a 
declarative  connection  between  a  prograun  and  its  appicodmation. 

A  potential  disadvantage  of  set  based  analysis  is  its  computational  cost. 
In  general,  the  set  based  analysis  algorithms  have  worst  case  EXPTIME  be¬ 
havior.  However,  in  practice,  it  appears  that  programs  rarely  exhibit  this. 
Significant  progress  has  been  made  towards  practical  set  based  analysis  by 
reformulating  the  set  constraint  algorithm  and  employing  appropriate  rep¬ 
resentation  techniques.  Although  much  work  remains,  practical  set  based 
analysis  is  within  reach,  particularly  for  functional  programs. 

We  observe  that  the  use  of  constraints  in  program  analysis  has  one  poten¬ 
tial  computational  advantage  over  other  methods.  In  particular,  constraints 
provide  a  very  compact  representation  of  a  program  approximation  because 
the  set  of  values  at  one  point  can  be  defined  recursively  in  terms  of  the 
values  at  some  other  program  point.  In  contrast,  abstract  interpretation 
approaches  construct  a  complete  representation  of  each  set  at  each  program 
point.  For  example.  Figure  12.1  shows  a  program,  the  constraints  obtsdned 
from  set  based  analysis  of  the  program  and  the  results  from  an  abstract 
interpretation  of  the  program  (using  a  domain  the  represents  values  exactly 
up  to  a  depth  of  4).  In  essence,  when  constraints  are  used  to  analyze  a 
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program,  the  result  is  a  single  structure  (which  is  a  regular  tree  grammar), 
and  the  specilic  sets  for  each  program  variable  and  program  point  are  iden¬ 
tified  by  the  different  non- terminals  of  the  grammar.  In  contrast,  abstract 
interpretation  approaches  essentially  construct  a  separate  structure  for  each 
program  variable  and  program  point. 

Although  the  algorithms  of  set  based  analysis  are  simple  and  direct,  a 
number  of  the  proofs  are  very  complex.  The  main  reason  for  this  is  that  we 
have  separated  the  definition  of  program  approximation  from  the  algorithms 
to  compute  it.  Hence,  the  correctness  proof  of  the  sbop  algorithm  (and  this 
includes  the  correctness  proofs  for  the  translation  of  environment  constraints 
to  set  constraints  and  the  correctness  proofs  for  the  set  constraint  algorithm) 
must  establish  and  an  exact  correspondence  between  sbop  and  the  output  of 
the  algorithm.  In  other  words,  our  correctness  proofs  establish  an  “ifP  con¬ 
ditions.  If  we  were  just  interested  in  showing  that  our  analysis  computed  a 
conservative  approximation  of  the  program’s  collecting  semantics  (and  this 
is  the  format  of  the  correctness  proofs  for  most  other  analysis  algorithms), 
then  it  would  suffice  to  consider  only  one  direction  of  these  proofs.  More¬ 
over,  the  design  of  the  set  based  approximation  was  primarily  motivated  by 
the  desire  for  a  simple  definition  of  approximation.  In  a  number  of  cases 
(notably  in  the  approximation  of  imperative  programs),  this  resulted  in  in¬ 
creased  algorithmic  complexity.  In  fact  a  substantial  part  of  the  complexity 
of  the  quantified  expression  algorithm  is  due  to  the  presence  of  projections 
in  program  terms  t  in  quantified  conditions  t  ^  se  and  t  j  se.  The  main 
difficulty  here  relates  to  the  definedness  of  environments  on  such  terms  t, 
and  this  is  the  sole  reason  for  the  introduction  of  the  safeness  invariant. 
Another  factor  contributing  to  the  complexity  of  the  quantified  expression 
algorithm  is  the  presence  of  complement  symbols,  which  axe  used  to  model 
the  inequalities  in  imperative  programs.  In  other  words,  a  number  of  the 
complexities  in  the  algorithm  are  a  direct  consequence  of  the  complexities  of 
reasoning  about  imperative  programs.  We  note  that  if  the  only  quantified 
expression  constraints  of  interest  are  those  that  arise  from  logic  programs, 
then  the  quantified  expression  algorithm  can  be  very  greatly  simplified. 

The  main  concern  of  this  thesis  has  been  the  foundational  issues  of  anal¬ 
ysis  that  ignores  inter- variable  dependencies,  and  in  particular,  to  show  that 
for  a  variety  of  languages  and  operational  semantics,  such  analysis  is  decid¬ 
able.  An  additional  concern  has  been  to  show  that  set  based  analysis  is 
practical,  and  this  is  still  work  in  progress.  A  number  of  extensions  to  the 
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basic  ideas  have  been  sketched  to  demonstrate  the  flexibility  of  the  set  based 
approach.  Two  particularly  promising  extensions  are  the  unfolding  engine 
and  the  analysis  of  functional  programs.  The  unfolding  engine  provides 
a  general  purpose  engine  for  analyzing  programs  that  combines  accurate 
reason  about  structures  with  an  ability  to  employ  abstract  interpretation 
domains  that  have  been  developed  for  reasoning  about  various  aspects  of 
inter- variable  dependencies.  We  are  currently  investigating  the  implementa¬ 
tion  of  the  unfolding  engine  as  well  adaptations  of  the  engine  to  the  analysis 
of  other  languages  such  as  constraint  logic  programming  languages.  The 
set  based  analysis  of  functional  programs  appears  to  have  close  connections 
to  simple  subtype  systems,  and  this  may  potentially  provide  a  very  sim¬ 
ple  and  intuitive  characterization  of  this  analysis.  Moreover,  the  analysis 
of  functional  programs  appears  to  be  practical  using  current  set  constraint 
implementation  techniques,  and  further  improvements  are  expected. 
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Appendix  I:  Existence  of 
Least  Models 


Two  main  families  of  set  calculi  are  used  in  the  body  of  the  thesis  -  en¬ 
vironment  constraints  and  set  constraints.  To  avoid  repeating  results  on 
basic  properties  of  these  calculi,  this  appendix  proves  these  properties  in 
a  generic  set  calculi  which  subsumes  both  environment  constraints  and  set 
constraints.  The  results  themselves  are  not  surprising  and  the  proofs  are 
rather  straightforward.  They  are  included  for  completeness  rather  than  sig¬ 
nificance.  The  main  property  that  is  proved  is  that  if  all  the  operations  of 
a  set  calculus  are  monotonic,  then  constraints  of  the  form  var  D  exp  have  a 
model  intersection  property.  An  important  corollary  of  this  is  the  existence 
of  least  models. 

A  generic  set  ccdcvltis  is  a  tuple  of  the  form  (V,  OP^V,  [  ])  where  V  is 
a  set  of  variables,  OV  is  a  set  of  operators,  each  with  a  unique  arity,  V  is 
the  set  of  underlying  values  of  the  calciilus,  and  [  ]  defines  the  meaning  of 
the  operators  by  associating  to  each  op  €  OV  of  arity  n,  a  function  [op] 
which  maps  each  n-tuple  of  subsets  of  V  into  a  subset  of  V.  In  the  context 
of  such  a  calculus,  expressions  are  defined  to  be  either  variables  or  of  the 
form  op(expi,...,e2pn)  where  op  is  an  operator  whose  arity  is  n,  and  the 
sci  are  expressions.  A  constraint  is  of  the  form  exp  D  exj/  where  exp  and 
exp'  are  expressions.  An  X  interpretation  is  a  mapping  from  variables  into 
subsets  of  2>,  and  is  extended  to  map  from  expressions  into  subsets  of  V  in 
the  obvious  manner: 

X  (op{expi , . . . ,  expni)  =  [op]  (l(expi ), . . .  ,I(6xp„)) 

An  interpretation  is  a  mode/of  a  collection  of  constraint  if,  for  each  constraint 
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exp  D  exp*  contained  in  the  collection,  it  is  the  case  that  I(exp)  2  X(exj/). 
Interpretations  are  ordered  componentwise:  I  D  T  if  I(var)  D  I*(var)  for 
each  variable  var  in  V. 

A  generic  set  calculus  is  monotonic  if  each  set  operator  op  6  OV  is  such 
that  [op]  is  monotonic  in  each  argument.  Given  such  a  set  calculus,  an  easy 
structural  induction  proves  that  expressions  are  monotonic  in  the  following 
sense: 

Proposition  46  IfIDX*  then  X(exp)  D  X*(exp). 

Proof:  The  proof  is  by  a  straightforward  structural  induction  on  exp.  In 
the  base  case,  where  exp  is  a  variable,  the  proof  follows  immediately  from 
the  fact  that  X(var)  D  X*(var)  for  all  variables  t>cr.  For  the  induction 
case,  suppose  that  exp  is  of  the  form  pp(expi,...,expn),  and  suppose  that 
the  proposition  holds  for  each  exp,-.  This  means  that  X(expi)  3  I'(exp,), 
i  =  l..n,  and  combining  this  with  the  monotonicity  of  [pp]  proves  that 

[op]  (l(cxpi), . . .  ,I(expn))  3  [op]  (x\expi), . . .  ,r(€xp„)) . 

It  is  immediate  that  J(op(cxpi, . . . ,  cxp„))  3  2’'(pp(cxpi, . . . ,  cxp„)).  [] 

Where  is  a  collection  of  interpretations,  define  that  is  the  inter¬ 
section  of  these  interpretations  defined  by 

(n«5)(uar)  ^  ^  ^ 

Now,  consider  a  collection  of  constraints  C  such  that  eadi  constraint  is  of 
the  form  wor  3  exp  where  var  is  a  variable  from  V  and  exp  is  an  expression. 
Such  constraints  are  said  to  be  in  variable-expression  form,  and  they  satisfy 
the  following  model  intersection  property: 

Proposition  47  (Model  Intersection  Property)  Let  C  be  a  collection 
of  variable-expression  form  constraints  in  a  monotonic  set  calculus.  If  S  is 
a  collection  of  models  of  C,  then  ^  ^  model  ofC. 

Proof:  Let  var  3  exp  be  any  constraint  in  C,  and  consider  the  following 
chain  of  inequalities: 
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(n«5)(war)  =  fll(var)  D  f]j{exp)  D  (n5)(eip) 
i€S  les 

The  first  equality  follows  immediately  from  the  definition  of  n«5.  For  the 
second  inequality,  recall  that  each  J  €  is  in  fact  a  model  of  var  D  exp, 
and  so  2{var)  D  T(exp).  The  inequality  then  follows  from  a  simple  property 
of  intersection.  For  the  third  inequality,  note  that  I  D  for  ®3ich  I  €  5. 
The  third  inequality  then  follows  form  the  fact  that  the  intersection  of  a 
collection  of  sets,  where  each  set  is  a  superset  of  (f|<S)(exp),  must  also  be 
a  superset  of  (n<S)(ca:p).  Hence  f|5  is  a  model  of  war  2  exp  and  it  follows 
that  is  a  model  of  C.  [] 

Corollary  4  (Least  Models)  If  C  is  a  collection  of  variable-expression 
form  constraints  in  a  monoionic  set  calculus  then  C  has  a  least  model. 

Proof:  Let  M  be  the  set  of  all  models  of  C.  By  proposition  47,  f|«5  must 
also  be  a  model  of  C.  But  i^  smaller  than  each  I  £  S,  and  so  it  must 
be  the  least  model  of  C.  [] 

We  conclude  this  appendix  with  three  properties  relating  to  least  models 
of  variable-expression  form  constraints.  First,  define  that  a  constraint  of  the 
form  var  D  exp,  where  var  is  a  variable  and  exp  is  an  expression,  is  called 
a  lower  bound  for  the  variable  van 

Proposition  48  Let  C  be  a  collection  of  variable-expression  form  con¬ 
straints  in  a  monotonic  set  calculus.  If  v  £  lm(C)(var)  then  C  contains  a 
lower  bound  for  var  of  the  form  var  2  exp  such  that  v  £  lm{C){exp). 

Proof:  Let  var  be  a  variable  and  let  v  be  a  value.  Suppose  that  all  con¬ 
straints  of  the  form  var  2  exp  in  C  are  such  that  v  ^  lm{C){exp).  Then 
define  an  interpretation  X  by 

I  ^HQivar)  -  {v}  if  var'  is  var 
'  '  (  lm(C)(var')  otherwise 

Consider  a  constraint  of  the  form  var  2  exp  in  C.  Clearly  I  D  lm{C),  and 
so  I(exp)  2  lTn{C){exp).  Combining  this  with  the  definition  of  I  and  the 
fact  that  lm(C)  is  a  model  of  var  2  exp  proves  that 


I(var)  =  /m(C)(var)-{v}  2  fm(C)(cxp)  -  {v}  =  /m(C)(exp)  2  I(exp) 
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and  so  J  is  a  model  of  var  3  exp.  Now  consider  a  constraint  of  the  form 
var'  3  exp  in  C  where  uor'  is  different  from  vor.  Since  I(vor')  =  I(war') 
3  I(exp)  3  lm(C)(exp),  it  follows  that  I  is  also  a  model  of  this  constraint. 
Hence  J  is  a  model  of  C  that  is  less  than  J,  and  this  contradicts  that  as¬ 
sumption  that  I  =  lm(C).  [] 

An  important  corollary  of  this  proposition  is: 


Proposition  40  Let  C  be  a  collection  of  variable-expression  form  con¬ 
straints  in  a  monotonic  set  calculus.  If  var  3  exp  is  the  only  lower  bound 
for  var  then  lm(C)(var)  =  lm(C)(exp). 


The  final  proposition  states  a  simple  property  of  least  models. 


Proposition  50  IfCi  andC^  are  collections  of  constraints  such  that  lm{Ci) 
and  lm(C2)  both  exist,  then 

Cl  C  C2  implies  lm(Ci)  Q  lm{C2). 


Proof:  Clearly  any  model  of  C2  is  a  model  of  Ci.  Hence  lm{C2)  is  a  model  of 
Cl.  Moreover,  any  model  of  Ci  must  be  larger  than  /m(Ci),  and  so  /m(Ci)  C 

lm{C2).  D 
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Appendix  II:  Abstract 
Interpretation  and  Monadic 
Programs 


To  try  to  quantify  the  limitations  of  abstract  interpretation,  we  shall  outline 
why,  unlilce  set  constraint  approaches,  abstract  interpretation  cannot  be  ex¬ 
act  over  the  class  of  monadic  programs.  Unfortunately,  a  completely  formal 
and  general  account  of  this  argument  is  very  difficult  to  formulate,  because 
it  is  always  possible  to  give  ad  hoc  extensions  to  an  abstract  interpretation 
algorithm,  so  that  it  can  compute  exact  information  on  a  specific  monadic 
program.  However,  this  cannot  be  done  in  a  uniform  manner  to  cover  all 
monadic  programs. 

We  therefore  consider  a  somewhat  restricted  class  of  abstract  interpreta¬ 
tions  based  on  the  bottom-up  semantics  of  logic  programs.  The  underlying 
concrete  domain  shall  be  environments,  and  we  shall  consider  arbitrary  ab¬ 
stract  domains  for  representing  environments.  After  outlining  the  based 
arguments,  we  shall  show  how  they  can  be  extended  to  abstract  interpreta¬ 
tion  involving  the  widening  and  narrowing  operators. 

We  begin  by  considering  the  following  program,  were  c  is  a  constant  and 
/  is  a  monadic  function  symbol. 

1.  p(c). 

2.  p(fiX))*-p{X). 

f  Denote  this  program  by  Pi.  We  formalize  the  collecting  semantics  of  Pi 

by  collecting  sets  of  environments  with  each  program  rule.  Specifically, 
define  that  an  association  d  is  a  mapping  from  program  rules  into  sets  of 
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environments,  and  let  ±  denote  the  association  that  maps  each  program  rule 
into  the  empty  set.  Now,  the  exact  semantic  function  for  Pi  maps  from  and 
into  such  associations  such  that  d  is  mapped  into  d'  where 

d(l)  =  {p  :  true} 

dm  -  //.  •  for  some  p'  €  d(l),  or  1 

'  ' "  r  ■  l>ip(X))  =  />'(K/(-y)))  />'  €  <1(2)  / 

This  definition  can  be  more  simply  stated  as: 

d(l)  =  {all  environments} 


d(2)  = 


p(Jf)  =  c  and  d'(l)  #  {}  or 
pIx)  =  pViX))  for  some  p'  e  d{2) 


Let  Pi  denote  this  semantic  function.  Now,  consider  the  Kleene  sequence 
±,  Pi(X),  Pi(Pi(±)), . . .  generated  by  this  function,  and  we  note  that  it  does 
not  converge  finitely.  The  element  of  this  sequence,  for  j  >  2,  has  the 
form 


d,(a) 


{all  environments}  if  a  =  1 
0j_3  if  o  =  2 


where  Qj  is  the  set  of  environments  {[Jr»->c],...,[A’i-»/^(c)]}. 


In  abstract  interpretation,  sets  of  environments  are  approximated  us¬ 
ing  an  abstract  domain  and  this  induces  an  approximate  semantic  function. 
There  many  different  formulations  of  abstract  interpretation,  differing  in  the 
specification  of  abstraction  and  concretization  functions,  and  algebraic  prop¬ 
erties  relating  the  concrete  domain,  the  abstract  domain  and  these  functions. 
However,  it  is  generally  the  case  that  the  abstract  domain  can  be  treated  as 
a  collection  of  subsets  of  environments,  and  the  abstraction  function  maps 
any  collection  @  of  environments  into  the  smallest  superset  of  6  that  is 
contained  in  the  abstract  domain.  Since  the  abstraction  function  and  the 
abstract  domain  are  so  closely  related,  we  shall  denote  both  by  the  same 
symbol,  A.  We  shall  assume  that  the  abstract  domain  contains  the  empty 
set  and  the  set  of  all  environments.  Now,  given  a  semantic  function  P  and 
an  abstract  domain  A,  the  approximate  semantic  function  can  be  defined 
by^: 


*  Strictly  ipeaking,  many  abstract  interpretations  use  a  conservative  iu>proximation  of  this  up- 
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=  >l(:F(d)). 


Returning  to  the  program  Pi,  the  approximate  semantic  function  corre¬ 
sponding  to  Pi  maps  an  association  d  into 


d(l)  =  {all  environments} 


d(2)  =  A 


p(X)  =  c  and  d(l)  5^  {}  or 
p(X)  =  p*(/(X))  for  some  p'  €  d(2) 


Denote  this  semantic  function  by  Now,  if  the  abstract  interpretation 
of  Pi  is  to  terminate,  then  the  sequence  ±,P{*(±),Pi*(Pi*(±)),...  must 
converge  finitely  to  some  association  d.  Clearly  it  must  be  the  case  that 
^(2)  2  :  n  >  O).  This  means  that,  at  some  step  t,  the  step 

of  the  Kleene  sequence  for  must  differ  from  the  step  of  the  Kleene 
sequence  for  P.  Let  t  be  the  step  such  that  the  Kleene  sequences  first  differ. 
If  di  is  the  step  of  the  sequence  for  P^,  then  it  must  be  the  case  that  d, 
is  a  proper  subset  of  6,-2,  because  A  is  monotonic. 


Now,  using  the  value  i,  construct  the  logic  program  P2  as  follows: 

1.  p(c). 

2.  p(/W)^9(X). 

3.  9(/(c))  ♦- p(c). 

4.  g(/(/(c)))  -  g(/(c)). 

t  +  2.  q{f{c))^q{r-\c)). 

The  semantic  function  corresponding  to  P2  can  be  stated  as 


proximate  semantic  fimction  for  algorithmic  reasons.  The  formulation  given  here  is  an  idealization 
of  abstract  interpretation,  and  represents  an  upper  bound  on  its  accuracy. 
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d(l) 


d(2) 


d(3) 

d(4) 

d(5) 


{all  environments} 

=  c  and  d(3)  {}  or 

,  . .  P(X)  =  /(c)  and  d(4)  #  {}  or 
•••••• 

p<jr)  =  r(c)Md4i+i)^0  , 

{all  environments} 

{P  :  d(3)  {}} 

{p  :  d(4)  #  {}} 


d(t  +  2)  =  {p 


+!)#{}} 


Let  denote  this  semantic  function,  and  consider  the  Kleene  sequence 
±,  52(X),  ^2(-^2(-L))i  •  •  •  that  it  generates.  The  element  of  this  sequence, 
for  j  >  2,  has  the  form 


di{a) 


{all  environments} 

{all  environments} 

{} 


ifa=  1 
if  a  =  2 
if  3  <  a  <  i 
if  y  <  a  <  t  +  2 


Note  that  and  3^2  ar®  quite  closely  related,  and  that  the  first  t  elements  of 
the  respective  Kleene  sequences  for  these  functions  are  virtually  identical.  It 
follows  that  the  approximate  semantic  functions  and  ^  closely 
related,  and  in  fact  it  is  easy  to  verify  that  the  respective  Kleene  sequences 
for  and  T2  are  related  in  the  following  sense:  if  j  <  i  and  dij  and  d2j 
are  respectively  the  elements  of  the  the  Kleene  sequences  for  and 
J^2i  then 


did(l)  =  d2M) 
rfij(2)  =  d2ji2) 

This  means  that  the  analysis  of  P2  using  the  abstract  domain  A  cannot  be 
exact,  since  the  least  fixed  point  of  ^2  niust  yield  an  association  d  such  that 
d(2)  is  a  proper  subset  of  0,-. 

Finally,  consider  the  operation  of  widening.  Widening  allows  the  use 
of  abstract  domains  that  would  otherwise  ^ve  non-terminating  analysis.  In 
effect,  it  allows  that  abstraction  function  to  take,  as  an  additional  argument, 
the  values  obtained  from  the  previous  iteration.  This  is  used  to  detect  and 
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truncate  ascending  chains  of  abstract  values  (see  the  discussion  on  page  15). 
The  above  argument  showing  that  abstract  interpretation  cannot  be  exact 
on  P2  can  be  easily  applied  to  abstract  interpretation  that  employs  the 
widening  operator.  The  main  observation  is  that  the  construction  of  the 
step  of  the  Kleene  sequences  for  Pi  and  Pj  using  widening  and  the  abstract 
domain  A  are  respectively  is  ^ven  by 

daj(a)  =  (dij_i(a))v(p-f*(dij_i)),  a  =  1..2 

d2Ao‘)  =  (d2j-i(ct))  V  (V^(d2j-i))  1  <  a  <  t  +  2 

where  V  denotes  the  widening  operator  employed.  It  is  again  easy  to  show 
that  dij(l)  =  d2j(l)»  where,  as  before,  t  is  the  smallest  number  such  that 
the  step  of  the  Kleene  sequence  for  T  differs  from  di  j.  Hence,  the  analysis 
of  P2  using  widening  and  the  abstract  domain  A  results  in  an  association  d 
that  does  not  correspond  to  the  exact  collecting  semantics  of  P2.  We  remark 
that  narrowing  can  be  subsequently  applied  to  improve  the  accuracy  of  d, 
but  in  general  it  cannot  recover  the  exact  collecting  semantics  of  a  monadic 
program. 
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Table  of  Notation 


Notation 

Explanation 

VAR 

the  set  of  program  variables 

X,Y,Z 

a  program  variable 

E 

the  set  of  function  symbols 

f,9,h 

a  data  constructor  or  function  symbol 

b,c 

a  constant  (0-ary  function  symbol) 

/w 

the  i**  projection  of  / 

s,t 

a  term  (logic  program  or  imperative  program) 

cond 

an  imperative  program  condition 

Stat 

an  imperative  program  statement 

Seq 

a  sequence  of  imperative  program  statements 

A,B,C 

a  logic  program  atom 

R 

a  logic  program  rule 

P 

a  program  (imperative  or  logic) 

0,  7 

a  program  label 

n,x 

a  program  point 

V 

a  program  value 

s 

a  set  (usually  of  program  values) 

p 

an  environment  (a  mapping  from  var  into  values) 

0 

a  set  of  environments 

p{X  i-»  »] 

the  environment  p  except  that  X  is  mapped  into  v 

9 

a  substitution  (a  mapping  from  var  into  terms) 

E 

a  conjunction  of  term  equations 

p  )=  cond 

p  satisfies  cond 

P\=E 

p  satisfies  each  equation  in  E 

an  environment  variable 

the  environment  variable  for  program  point  p, 
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Notation 

Explanation  (contd.) 

Q 

a  set  environment 

the  set  environment  g  except  that  X  is  mapped  into  S 

p  |=j  cond 

p  satisfies  cond  in  the  context  of  g 

var(exp) 

the  program  variables  in  exp  if  exp  is  a  sequence  of 

program  terms,  rules  or  equation  conjunctions 

var(exp) 

the  set  variables  in  exp  if  exp  is  a  collection  of 
set  expressions  or  set  constraints 

var{a) 

the  program  variables  in  the  rule  .ft  if  o  is  the  label  of  ft 

var(a) 

the  program  variables  in  the  rule  ft  if  a  is  the  label 
of  a  body  atom  in  ft 

A:,y,2 

a  set  variable 

Vn 

an  intersection  variable 

S€ 

a  set  expression 

a 

an  atomic  set  expression 

conj 

a  conjunction  of  quantified  conditions 

{X  :  conj} 

a  quantified  set  expression 

C 

a  collection  of  set  constraints 

I 

a  mapping  from  variables  into  sets 

l^C 

J  is  a  model  of  C 

lm{C) 

least  model  of  the  constraints  C 

€Cp 

the  environment  constraints  for  P 

CSp 

the  collecting  semantics  of  P 

sbop 

the  least  set  based  model  of  £Cp 

SCp 

the  set  constraints  for  P 

S 

a  transformation 

Notes: 

1  Function  symbols  and  projection  symbols  are  overloaded  in  the  sense 
that  these  symbols  are  not  only  used  to  denote  mappings  from  values 
into  values,  but  they  are  also  used  in  set  expressions  as  mappings  from 
sets  into  sets.  The  intended  usage  is  usually  clear  from  context. 

2  A  value  is  a  data  structure  in  the  case  of  an  imperative  program  and 
a  ground  term  in  the  case  of  a  logic  program. 

3  For  logic  programs,  program  labels  and  program  points  coincide. 
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quantified  operator,  147 
quantified  set  expression 
algorithm,  196-251 
definition,  147 
reduced  form,  198 
transformation,  223 

reduce(),  200 
regiilar  tree  grammar,  166 
basic  algorithms,  169-172 

safeness  invariant,  205,  233 
set  based  approximation 
definition,  118 

set  based  interpretation,  116-121 
set  constraint  algorithm 

generic  algorithm,  176-179 
intersection-projection,  179-196 
correctness,  183-196 
transformations,  181-183 
overview,  172-173 
quantified  set  expression,  196- 
251 

correctness,  224-251 
transformations,  218-223 
set  constraints 

basic  properties,  149-150 
definite,  150 
definition,  144-149 
explicit  form,  168 
basic  algorithms,  169-172 
extensions  for  functional  pro¬ 
gramming,  320 
extensions  for  mode  analysis, 
283 

extensions  for  structure  shar¬ 
ing,  287 

motivation,  22-28 
standard  form,  173 


set  environment,  108 
set  expression 
atomic,  168 
standard  form,  173 
standard  form 

conversion  to,  174 
set  constraints,  173 
set  expression,  173 
standardizeO,  174 

transformations 

implementation,  262 
intersection-projection  algorithm, 
181-183 

intersection,  182 
projection,  182 
substitution,  181 
quantified  set  expression  alg., 
218-223 

intersection,  220 
projection,  219 
quantified  set  expression,  223 
substitution,  218 
soundness,  definition  of,  177 

unfolding  engine,  306 

widening,  15,  354 
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